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Foreword 


This manual has been produced to familiarize data users with the overall Education 
Longitudinal Study of 2002 (ELS:2002) design. It also seeks to familiarize users with the 
procedures followed for data collection and processing over the four rounds of the study, with 
particular emphasis on the third follow-up. Additionally, this document provides information that 
will support research and policy analyses. The National Center for Education Statistics hopes 
that analysts will find that the ELS:2002 data are organized and equipped in a manner that 
facilitates users’ straightforward production of statistical summaries and analyses of the 
ELS:2002 youth cohorts as they embark on the transition from high school to postsecondary 
education and to the labor market, and assume yet other roles that serve as markers of newly 
achieved adult status. 

Jeffrey A. O wings 
Chief 

Longitudinal Studies Branch 
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Chapter 1. 
Introduction 


1.1 Overview of the Data File Documentation 

This report provides guidance and documentation for users of the combined base-year to 
third follow-up data of the Education Longitudinal Study of 2002 (ELS:2002). ELS:2002 is 
sponsored by the National Center for Education Statistics (NCES) of the Institute of Education 
Sciences, U.S. Department of Education. The base-year and follow-up studies were conducted 
through a contract to RTI International, 1 a university-affiliated, nonprofit research organization 
based in North Carolina. This manual contains infonnation about the purposes of ELS:2002, the 
base-year to third follow-up data collection instruments, the sample design, and data collection 
and processing procedures. The manual provides guidance for understanding and using data from 
all components of the base year and its three follow-ups. 

The ELS:2002 dataset has been produced in both restricted-use and public-use versions. 
This document is based on the public-use data files; however, much of the assembled material is 
relevant to the restricted-use files as well. The public release data files reflect alteration or 
suppression of some of the original data. Such edits were imposed to minimize the risk of 
disclosing the identity of responding schools and individuals. 

Although this document supplies needed information from the ELS:2002 base year and 
follow-ups, the first three rounds are treated in summary fashion, because the earlier guides to 
the study data remain available and provide information in more detail. 2 Chapter 1 addresses 
three main topics. First, it supplies an overview of the NCES education longitudinal studies 
program, thus situating ELS:2002 in the context of the earlier NCES high school cohorts studied 
in the 1970s, 1980s, and 1990s, to which ELS:2002 can at various points (including the third 
follow-up) be compared. Second, it introduces ELS:2002 by delineating its principal objectives. 
Third, it provides an overview of the base-year to third follow-up study designs. 

In subsequent chapters, additional topics are addressed: instrumentation (chapter 2); 
sample design and weighting (chapter 3); data collection methods and results (chapter 4); data 
preparation and processing (chapter 5); weighting, imputation, and design effects (chapter 6); 
and data file contents (chapter 7). Appendixes include variable lists; questionnaire facsimile; 
third follow-up composite variables; glossary of tenns; cross-cohort comparisons to the National 


1 RTI International is a trade name of Research Triangle Institute. 

2 For the 2002 base-year field test, see Bums et al. 2003 (NCES 2003-03) and for the base-year main study Ingels et al. 2004 
(NCES 2004-405). For the 2004 first follow-up, Ingels et al. 2005 (NCES 2006-344). For the ELS:2002 high school transcript 
study, restricted documentation is in Bozick et al. 2006 (NCES 2006-338). Ingels et al. 2007 (NCES 2008-347) documents the 
full-scale second follow-up study and summarizes the high school transcript study. 
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Education Longitudinal Study of 1988 (NELS:88) in 2000; standard error and design effects 
tables; average weight adjustment factors; nonresponse bias; and across-round cross-walk of 
occupation codes. 

1.2 Historical Background 

1.2.1 NCES Education High School Longitudinal Studies Program 

Because ELS:2002 was designed to be closely comparable to prior NCES secondary 
school cohort studies, it is important to understand ELS:2002’s predecessors. In response to its 
mandate to “collect and disseminate statistics and other data related to education in the United 
States” and the need for policy-relevant, nationally representative longitudinal samples of 
elementary and secondary students, NCES instituted the Secondary Longitudinal Studies 
program. The aim of this continuing program is to study the educational, vocational, and 
personal development of students at various stages in their educational careers and the personal, 
familial, social, institutional, and cultural factors that may affect that development. 

The Secondary Longitudinal Studies program comprises three completed studies: the 
National Longitudinal Study of the High School Class of 1972 (NLS:72), the High School and 
Beyond (HS&B) longitudinal study of 1980, and NELS:88. In addition, base-year to third 
follow-up data for ELS:2002, the fourth longitudinal study in the series, are now available, with 
the exception of postsecondary educational transcript data, which are currently being collected 
and processed. 

A fifth study (successor to ELS:2002) is the High School Longitudinal Study of 2009 
(HSLS:09). HSLS:09 began with ninth-graders in 2009 with a follow-up in the spring term of 
2012. Further data collections (both interviews and transcripts) are scheduled, including 2013 
interview and transcript collections. 

Taken together, these studies describe (or will describe) the educational experiences of 
students from five decades. Figure 1 includes a temporal presentation of these five longitudinal 
education studies and highlights their component and comparison points. 
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Figure 1. Longitudinal design for the NCES high school cohorts: 1972-2016 
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1.2.2 National Longitudinal Study of the High School Class of 1972 

The secondary longitudinal studies program began more than 40 years ago with the 
implementation of NLS:72. 3 NLS:72 was designed to provide longitudinal data for education 
policymakers and researchers who link educational experiences in high school with important 
downstream outcomes such as labor market experiences and postsecondary education enrollment 
and attainment. With a national probability sample of 19,001 high school seniors from 1,061 
public and religious and other private schools, the NLS:72 sample was representative of 
approximately 3 million high school seniors enrolled in 17,000 U.S. high schools during the 
spring of the 1971-72 school year. Each student was asked to complete a questionnaire and a 
cognitive test battery. In addition, administrators at the sample members’ schools were asked to 
supply infonnation about the schools’ programs, resources, and grading systems, as well as 
survey data on each student. NLS:72 also included a counselor survey. Five follow-up surveys 
were completed with this cohort, with the final data collection taking place in 1986, when the 
sample members were 14 years removed from high school and approximately 32 years old. 
Postsecondary education transcripts were collected from the institutions attended by students. All 
of the secondary longitudinal studies from NLS:72 forward have had, as ELS:2002 will have, a 
final data collection comprising postsecondary education transcripts. 

1.2.3 High School and Beyond 

The second in the series of NCES longitudinal studies was launched in 1980. HS&B 
included one cohort of high school seniors comparable to the NLS:72 sample; however, the 
study also extended the age span and analytical range of NCES longitudinal studies by surveying 
a sample of high school sophomores. Base-year data collection took place in the spring tenn of 
the 1979-80 academic year with a two-stage probability sample. More than 1,000 schools served 
as the first-stage units, and 58,000 students within these schools were the second-stage units. 
Students completed a questionnaire and a cognitive test battery. Both cohorts of HS&B 
participants were resurveyed in 1982, 1984, and 1986; the sophomore group also was surveyed 
in 1992. 4 In addition, to better understand the school and home contexts for the sample members, 
data were collected from teachers (a teacher comment form in the base year asked for teacher 
perceptions of HS&B sample members), principals, and a subsample of parents. High school 
transcripts were collected for a subsample of sophomore cohort members. As in NLS:72, 
postsecondary transcripts were collected for both HS&B cohorts. 


3 For documentation on NLS:72, see Riccobono et al. (1981) and Tourangeau et al. (1987). 

4 For a summation of the HS&B sophomore cohort study, see Zahs et al. (1995). For further information on HS&B, see the 
NCES website: http://nccs.cd.gov/sui-vcys/hsb/ . 
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1.2.4 National Education Longitudinal Study of 1988 

Much as NLS:72 captured a high school cohort of the 1970s and HS&B captured high 
school cohorts of the 1980s, NELS:88 was designed to study high school students of the 1990s — 
but with a baseline measure of their achievement and status, prior to their entry into high school. 
NELS:88 tracked students from junior high or middle school through secondary and 
postsecondary education, labor market experiences, and marriage and family fonnation. Data 
collection for NELS:88 was initiated with the eighth-grade class of 1988 in the spring tenn of the 
1987-88 school year. Along with a student survey, NELS:88 included surveys of parents (base 
year and second follow-up), teachers (base year, first and second follow-ups), and school 
administrators (base year, first and second follow-ups). The cohort was also surveyed twice after 
their scheduled high school graduation, in 1994 and 2000. 5 High school transcripts were 
collected in the fall of 1992 and postsecondary transcripts in the fall of 2000. Through a process 
of sample freshening, NELS:88 offers three nationally representative cohorts of students: spring- 
term 8th-, 10th-, and 12th-graders. 

1 .2.5 High School Longitudinal Study of 2009 

HSLS:09, the successor study to ELS:2002, is not directly comparable to the other 
secondary longitudinal studies in that it takes a different measuring point (fall term of 9th grade 
and spring tenn of 12th grade). Nonetheless, it models the same transition, through high school 
to postsecondary education, the work force, and young adulthood — that is studied by NLS:72, 
HS&B, NELS:88, and ELS:2002, and uses many of the same methods. 

HSLS:09 students were administered both a questionnaire and an assessment in algebraic 
reasoning, in both the base year (2009) and first follow-up (2012). HSLS:09 also captures 
contextual data sources, with a base-year survey of teachers, and base-year and first follow-up 
surveys of administrators, counselors, and parents. A 2013 update survey targets students or 
parents, and a high school transcript study is being conducted as well. Another major data 
collection will take place in 20 16. 6 

1.3 Education Longitudinal Study of 2002 

ELS:2002 represents a major longitudinal effort designed to provide trend data about 
critical transitions experienced by students as they proceed through high school and into 
postsecondary education or their careers. The 2002 sophomore cohort is being followed, initially 
at 2-year intervals, to collect policy-relevant data about educational processes and outcomes, 


5 The entire compass of NELS:88, from its baseline through its final follow-up in 2000, is described in Curtin et al. (2002). 

NCES maintains an updated version of the NELS:88 bibliography on its website. The bibliography encompasses both project 
documentation and research articles, monographs, dissertations, and paper presentations employing NELS:88 data (see 
http://nccs.cd.gov/survcvs/ncls88/Bibliographv.asp ). 

6 For further infonnation about HSLS:09, please see Ingels et al. 2013 (NCES 2014-361), the High School Longitudinal Study of 
2009 (HSLS:09) Base-Year to First Follow-up Data File Documentation. 
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especially as such data pertain to student learning, predictors of dropping out, and high school 
effects on students’ access to, and success in, postsecondary education and the workforce. 

In the spring tenn 2002 base year of the study, high school sophomores were surveyed 
and assessed in a national sample of high schools with 10th grades. Their parents, teachers, 
principals, and librarians were surveyed as well. 

In the first of the follow-ups, base-year students who remained in their base-year schools 
were resurveyed and tested (in mathematics) 2 years later, along with a freshening sample that 
makes the study representative of Spring 2004 high school seniors nationwide. Students who had 
transferred to a different school, switched to a homeschool environment, graduated early, or 
dropped out were administered a questionnaire but were not given the assessment, which took 
place only in a school setting. In the fall and winter after the 2004 first follow-up, high school 
transcripts were collected. The transcript data file is comparable to the high school transcript 
studies of HS&B and NELS:88, as well as the National Assessment of Educational Progress 
(NAEP) high school transcripts. 

In the second follow-up (2006), data were collected from sample members, using 
questionnaires self-administered on the Web, or through interviewer-administered electronic 
instruments. In addition, administrative records were pursued, such as the National Student Loan 
Data System and Free Application for Federal Student Aid (FAFSA). 

This section introduces ELS:2002, lists some of the major research and policy issues that 
the study addresses, and explains the three levels of analysis — cross-sectional, cross-cohort and 
trend, and longitudinal — that can be conducted with ELS:2002 data. 

1.3.1 ELS:2002 Study Design 

ELS:2002 is designed to monitor the transition of a national sample of young people as 
they progress from 10th grade through high school and on to postsecondary education or the 
world of work, or both. 

ELS:2002 has two distinctive features. First, it is a longitudinal study, in which the same 
units (schools and students) are surveyed repeatedly over time. Individual students were 
followed through high school and for 8 years thereafter. The base-year schools were surveyed 
twice, in 2002 and in 2004. Second, in the high school years, ELS:2002 is an integrated, 
multilevel study that involves multiple respondent populations. The respondents include 
students, their parents, their teachers, and their schools (from which data are collected at four 
levels: from the principal, the librarian, a facilities checklist, and school course catalogues and 
records, which support a course offerings component in the first follow-up transcript study). 

Each of the two distinctive features — the longitudinal nature of the ELS:2002 design and its 
multilevel focus — will be explained in greater detail below. 

The transition through high school and beyond into postsecondary institutions and the 
labor market is both complex (youth may follow many paths) and prolonged (it takes place over 
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a period of years). The complexity and time frame for this transition make longitudinal 
approaches especially appropriate. By surveying the same young people over time, it is possible 
to record the changes taking place in their lives. It is also possible to gather infonnation about the 
ways that earlier achievements, aspirations, and experience predict what happens to the 
respondents later. In the baseline data collection (Spring 2002), ELS:2002 measured students’ 
tested achievement in reading and mathematics. ELS:2002 also obtained infonnation from 
students about their attitudes and experiences. 

These same students were resurveyed 2 years later (in 2004), in the ELS:2002 first 
follow-up, to measure changes such as achievement gains in mathematics and changes in 
enrollment status (e.g., the situation of students who drop out of school compared with those who 
persist in their education). Resurveyed 4 years after the base year (2006), second follow-up data 
supply infonnation about postsecondary educational access and choice, or transition to the labor 
market for cohort members who did not continue their education. In the third follow-up, sample 
members were interviewed between July 2012 and February 2013 to collect the study’s final 
outcomes (e.g., persistence in higher education, sub-baccalaureate and baccalaureate attainment, 
transition into the labor market). Postsecondary transcripts are also being collected as part of the 
third follow-up, and restricted-use documentation for this component will be provided 
separately. 

ELS:2002 gathers information at multiple levels. It obtains infonnation not only from 
students and their school records, but also from students’ parents, teachers, and the 
administrators (principal and library media center director) of their schools. Data from their 
teachers, for example, provide information both about the student’s and the teacher’s 
backgrounds and activities. This multilevel focus supplies researchers with a comprehensive 
picture of the home, community, and school environments and their influences on the student. 

This multiple-respondent perspective is unified by the fact that, for most purposes, the 
student is the basic unit of analysis. In addition, administrative records have been pursued and 
made part of the ELS:2002 database, such as the decennial Census to the NCES Integrated 
Postsecondary Education Data System (IPEDS) and student application and loan sources 
(Central Processing System; CPS [FAFSA], the National Student Loan Data System). Using this 
multilevel and longitudinal information, the base year (2002) and first follow-up (2004) of 
ELS:2002 help researchers and policymakers explore and better understand such issues as the 
importance of home background and parental aspirations for student success; the influence of 
different curriculum paths and special programs; the effectiveness of different high schools; and 
whether a school’s effectiveness varies with its size, organization, climate or ethos, curriculum, 
academic press, or other characteristics. These data facilitate understanding of the impact of 
various instructional methods and curriculum content and exposure in bringing about educational 
growth and achievement. 
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With the addition of postsecondary data in the 2006 second follow-up, ELS:2002 greatly 
enlarged its ability to connect high school antecedents to later outcomes. For students who 
continue on to higher education, researchers can use ELS:2002 to measure the effects of their 
high school careers on subsequent access to postsecondary institutions, their choices of 
institutions and programs, and as time goes on, their postsecondary persistence, attainment, and 
eventual entry into the labor force and adult roles. For students who go directly into the 
workforce (whether as dropouts or high school graduates), ELS:2002 can help to determine how 
well high schools have prepared these students for the labor market and how they fare within it. 
The third follow-up (2012) data illuminate the final outcomes of the study, speaking in particular 
to issues of postsecondary attainment, persistence, and entry into and success in the labor market. 

Key elements in the ELS:2002 longitudinal design are summarized by round below. 

Base Year (2002) 

• Completed baseline survey of high school sophomores in the spring term of 2002. 

• Completed cognitive tests in reading and mathematics. 

• Completed survey of parents, English teachers, and mathematics teachers. Collected 
school administrator questionnaires. 

• Included additional components for this study — a school facilities checklist and a 
media center (library) questionnaire. 

• Established sample sizes of approximately 750 schools and more than 17,000 
students. Schools are the first-stage unit of selection, with sophomores randomly 
selected within schools. 

• Oversampled Asian 7 and Hispanic students and private schools. 

• Designed linkages with the Program for International Student Assessment (reading 
and math) and NAEP (2005 math); scored reporting linkages to the prior longitudinal 
studies. 

First Follow-up (2004) 

• Most sample members were seniors, but some were dropouts or in other grades (early 
graduates or retained in an earlier grade). 

• Student questionnaire (different versions for students who remained in the base-year 
school, transferred to a new school, completed high school early, or were 
homeschooled), dropout questionnaire, assessment in mathematics, and school 
administrator questionnaire were administered. 


7 

Except where indicated otherwise, the race/ethnicity variable for this report includes six categories: (1) American Indian or 
Alaska Native; (2) Asian or Native Hawaiian or other Pacific Islander; (3) Black, including African American; (4) Hispanic or 
Latino; (5) More than one race; and (6) White. All race categories exclude individuals of Hispanic or Latino origin. 
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• Returned to the same schools but separately followed transfer students and surveyed 
them outside of school. 

• Freshened for a senior cohort. 

• High school transcript component in 2004 (coursetaking records at the student level 
for grades 9-12) and course offerings component at the school level. 

Second Follow-up (2006) and Third Follow-up (2012, 2013) 

• Survey 2 and 8 years after scheduled high school graduation. 

• Post-high school follow-up with web-based instrument for self-administration, 
computer-assisted telephone interview, or computer-assisted personal interview. 

• Use of administrative records sources to augment information about ELS:2002 
sample members, including SAT, ACT, General Educational Development, CPS, and 
NSLDS. 

• Postsecondary education transcripts. 

1.3.2 ELS:2002 Research and Policy Issues 

Apart from helping to describe the status of high school students and their schools, 
ELS:2002 will provide infonnation to help address a number of key policy and research 
questions. The study is intended to produce a comprehensive dataset for the development and 
evaluation of education policy at all government levels. Part of its aim is to inform decision- 
makers, education practitioners, and parents about the changes in the operation of the educational 
system over time and the effects of various elements of the system on the lives of the individuals 
who pass through it. Issues that can be addressed with data collected in the high school years 
include the following: 

• students’ academic growth in mathematics; 

• the process of dropping out of high school; 

• the role of family background and the home education support system in fostering 
students’ educational success; 

• the relationship between coursetaking choices and success in the high school years 
(and thereafter); 

• the distribution of educational opportunities as registered in the distinctive school 
experiences and performance of students from various subgroups; such subgroups 
include the following: 

- students in public and private high schools; 

- language minority students; 

- students with disabilities; 

- students in urban, suburban, and rural settings; 
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- students in different regions of the country; 

- students from upper, middle, and lower socioeconomic status levels; 

- male and female high school students; and 

- students from different racial or ethnic groups. 

• steps taken to facilitate the transition from high school to postsecondary education or 
the world of work. 

Now that most ELS:2002 students have completed high school, a new set of issues can be 
examined with the help of data collected in 2006. These issues include the following: 

• the later educational and labor market activities of high school dropouts; 

• the transition of those who do not go directly on to postsecondary education or to the 
world of work; 

• access to and choice of undergraduate and graduate educational institutions. 

The 2012 data support further investigations: 

• persistence in attaining postsecondary educational goals; 

• rate of progress through the postsecondary curriculum; 

• degree attainment; 

• barriers to persistence and attainment; 

• impacts of educational indebtedness; 

• entry of new postsecondary graduates into the workforce; 

• social and economic rate of return on education to both the individual and society; 
and 

• adult roles, such as family formation and civic participation. 

These various research and policy issues can be investigated at several distinct levels of analysis. 
The overall scope and design of the study provide for the three following analytical levels: 

• cross-sectional profiles of the nation’s high school sophomores (2002), seniors 
(2004), and dropouts in the last 2 years of high school (2004); 

• longitudinal analysis (including examination of life course changes); 

• intercohort comparisons with American high school students of earlier decades. 

Data users may benefit from consulting and building on past research publications that 
use the ELS:2002 data. A bibliography is maintained at 
http://nces.ed.gov/survevs/els2002/Bibliography.asp . 
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1.3. 2.1 Cross-sectional Profiles 

Cross-sectional data permit characterization of the nation’s high school sophomores in 
the spring term of the 2001-02 school year (the base year) and (owing to a freshening process), 
high school seniors in 2004 (the first follow-up). 

1.3. 2.2 Longitudinal Analysis 

The primary research objectives of ELS:2002 are longitudinal in nature. The study 
provides the basis for within-cohort comparison by following the same individuals over time to 
measure achievement growth in mathematics and monitor enrollment status over the high school 
years (base year, first follow-up, and high school transcripts). With the second and third follow- 
ups, as well as postsecondary transcript data, the study records such key outcomes as 
postsecondary entry and attainment, labor market experiences, and family fonnation. In turn, 
these outcomes can be related to antecedents identified in earlier rounds, including individual, 
home, school, and community factors. 

1.3. 2.3 Intercohort Comparisons 

Intercohort comparisons can be built on either a cross-sectional analysis (more 
specifically, comparison made on a time-lag basis with a temporal series anchored in sameness 
of grade for historically distinct cohorts with the intent of identifying and measuring trends) or 
longitudinally, that is, by taking the history of the age/grade cohort into account and modeling it 
vis-a-vis other panels. The cross-sectional time-lag approach has more often been used in the 
NCES secondary longitudinal studies than the intercohort panel approach. As part of an 
important historical series of studies that repeats a core of key items each decade, ELS:2002 
offers the opportunity to analyze trends in areas of fundamental importance, such as patterns of 
overall and subgroup coursetaking, rates of participation in extracurricular activities, academic 
perfonnance, and changes in goals and aspirations. Cross-cohort comparisons can be made with 
high school sophomores (HS&B, NELS:88, and ELS:2002), and seniors (including coursetaking 
information gathered from secondary longitudinal studies transcripts in HS&B, NELS:88, and 
ELS:2002 as well as NAEP transcripts for graduates in 1987, 1990, 1994, 1998, 2000, 2005, and 
2009). Analysis can also be conducted with high school cohorts in the postsecondary years 
(NLS:72, HS&B, NELS:88, and ELS:2002 senior cohorts 2 years out of high school; NELS:88 
and ELS:2002 8 years out of high school). 
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Instrumentation 

2.1 Base-year to Second Follow-up Instruments 

The base-year (2002) data collection instruments for the Education Longitudinal Study of 
2002 (ELS:2002) consisted of five questionnaires (student, parent, teacher, school administrator, 
and library media center), two achievement tests (assessments in reading and mathematics), and 
a school observation form (facilities checklist). 

The first follow-up (2004) data collection instruments comprised seven questionnaires: a 
student questionnaire (in full-length and abbreviated forms), a transfer student questionnaire, a 
new participant student questionnaire (administered to students who were part of the freshened 
sample or base-year nonrespondents), a homeschool student questionnaire, an early graduate 
questionnaire, a dropout (not currently in school) questionnaire, and a school administrator 
questionnaire. The first follow-up also included an achievement test in mathematics as well as a 
high school transcript component, with academic transcripts collected from fall of 2004 to early 
2005. 

In the second follow-up (2006), there was a single questionnaire administered to study 
participants (the eligible student cohort members sampled in the base year or brought in through 
freshening in the first follow-up), designed in an electronic fonnat, for self- or interviewer 
administration. 

The base-year and first, second, and third follow-up questionnaires can all be found on 
the National Center for Education Statistics (NCES) ELS:2002 website 
(http ://nces ,ed. gov/ surveys/ els2002/questionnaires ). The base-year (Burns et al. 2003, NCES 
2003-03), first follow-up (Ingels et al. 2004, NCES 2004-405), and second follow-up (Ingels et 
al. 2007, NCES 2008-347) data file documentation are also available from the NCES website. 
The high school transcript study is documented in Bozick et al. 2006 (NCES 2008-338) 
(document restricted to data license holders), and Ingels et al. 2007 (NCES 2008-347) (public 
use document). 

2.2 Third Follow-up Instrument Development Process and 
Procedures 

The ELS:2002 third follow-up questionnaire was designed for electronic self- 
administration (web) or computer-assisted interviewer administration (computer-assisted 
telephone interview-CATI or computer-assisted personal interview-CAPI). Items were selected 
primarily for their intracohort value, that is, their relevance, as final outcomes, to the antecedent 
or predictor variables gathered in earlier rounds. Of secondary importance was the intercohort 
value of items, that is, whenever possible variables were used which would prove comparable to 
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those employed in the final round of NELS:88 in 2000, when the NELS:88 cohort was 
approximately the same age (and years beyond high school) as the third follow-up ELS:2002 
sample (for a cross-walk between NELS:88 in 2000 and ELS:2002 in 2012, see appendix B of 
this document; for further infonnation on the NELS:88 fourth follow-up, see Curtin et al. 2002 
[NCES 2002-323] and Ingels et al. 2002 [NCES 2002-321]). 

In general, the development and review process for the third follow-up questionnaire 
consisted of the following nine steps. 

First, a literature review was conducted. This review covered the following domains: 

(1) Employment and career outcomes, including labor force activities and establishing careers, 
military employment and careers, and job-related training, certification, and licensure; 

(2) Education outcomes, including high school completion, postsecondary education enrollment 
history, postsecondary education attainment: credentials earned, retrospective on the college 
experience, educational expectations and aspirations, other adult education, social-cognitive 
measures of job orientation; and educational debt and financial aid; and (3) Other outcomes 
reflective of attaining to adult status, including geographic location and living arrangements, 
family formation, income and assets, health, perceived adulthood, life and career values, and 
civic engagement, such as community service and voting. This review guided subsequent 
discussions with the Technical Review Panel (TRP) (see below for an account of the role of this 
body). 

Second, questionnaire drafts were circulated across programs at NCES and between RTI 
International and NCES, and revisions made on the basis of the resulting comments. 

Third, new topics and constructs, and prioritizations of these and past constructs, were 
recommended in commissioned papers from three ELS:2002 TRP members, representing three 
distinct academic disciplines — psychology (particularly social cognitive theory), economics 
(particularly labor economics), and sociology (particularly life course perspective). These 
memoranda were shared with the other panelists. 

Fourth, certain items that were both new and of critical importance were tested in two 
series of cognitive interviews. The first series of cognitive interviews (pre-field test) focused on 
social cognitive career theory items that were developed for ELS:2002 by Professor Robert Lent; 
the second series (pre-full-scale study) focused on potential new items concerning financial aid 
and economic literacy. 

Fifth, before and after the field test, draft instruments were reviewed by the ELS:2002 
third follow-up TRP, a specially appointed independent group of substantive, methodological, 
and technical experts. The panel met twice in 2-day meetings (September 2010 and November 
2011 ). 

Sixth, justifications were written for questionnaire items for Office of Management and 
Budget (OMB) review. 
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Seventh, actual development of the questionnaire, including specification, routing, 
programming, and testing, took place within an electronic survey editor. 

Eighth, a field test was conducted to test questionnaire items; results of the field test are 
documented in the field test methodology report (Ingels et al. 2012, NCES 2012-03). 

Ninth, on the basis of the field test, recommendations were generated for the ELS:2002 
third follow-up questionnaire, and full-scale clearance sought and received from OMB, after a 
final review of the proposed questionnaire items. 

Although the questionnaire is a key source of information for the study, it should be 
noted that the survey component will be complemented by a postsecondary transcript collection, 
and the tapping of ancillary administrative records data. 

2.3 Third Follow-up Instrument Content 

As noted earlier in chapter 1, because the primary research objectives of ELS:2002 are 
longitudinal in nature, the first priority for questionnaire development was to select the items that 
would prove most useful in a longitudinal context — in the third follow-up, these items are the 
outcomes, predicted by the antecedents measured in past survey rounds. The second priority was 
to obtain needed cross-sectional data, whenever consistent with the longitudinal objectives, 
particularly data that could be used for intercohort comparison with past studies or linkage to 
certain current data collection efforts. Wherever possible, all ELS:2002 instruments were 
designed to provide continuity and consistency with the earlier education longitudinal studies of 
high school cohorts. For the third follow-up, there is only one strictly comparable point in time 
with which to link on a cross-cohort basis, the NELS:88 fourth follow-up in 2000. 

Below, content of the third follow-up questionnaire is summarized, followed by a 
summary of the abbreviated version of the instrument. 

Current status. The interview asks about the respondent’s current activities, such as labor 
market status and educational status. 

High school completion. For sample members who had not completed high school (or 
General Educational Development [GED]) by the second follow-up or whose completion status 
was unknown, the third follow-up interview obtained updated information. 

Postsecondary education. This section of the interview focused on the postsecondary 
enrollment and attainment at all levels of credentialing and degree completion and includes all 
forms and levels of sub-baccalaureate, baccalaureate, and graduate and professional enrollment. 

It also gathers information such as primary or secondary major or program of study. First, 
sample members were asked to identify postsecondary institutions they had attended; second, 
they were asked to identify any postsecondary credentials earned. Attendance information will 
be used to conduct the postsecondary education transcript component of the study in 2013-14. 
Reasons for leaving school were also elicited. 
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The college experience. Although most sample members were, by 2012, out of school, it 
was still possible to ask some questions retrospectively, about the college experience, and its 
perceived role and impact as carried into the mid-twenties of the cohort. 

Education finance. The questionnaire explores the issue of educational borrowing and its 
impact. Infonnation about receipt of scholarships, fellowships, and grants is also obtained. 

Educational expectations. Following in the tradition of the prior NCES high school 
cohort studies, all respondents were asked to report the highest level of education they expected 
to achieve by age 30 (average age at time of interview is about 26). 

Employment and income. The interview gathered information on employment and 
income. A brief employment history is collected. Respondents were asked to answer a series of 
questions about job title and duties, hours worked, earnings, and employer type. Both the 
employed and the unemployed were asked about perceived employment barriers they may have 
faced or be facing. All respondents were asked for their annual income and whether they have 
any dependents. These questions will allow analysts to roughly estimate net earnings after taxes. 
The questionnaire asked separately about employment through the military. In addition, some 
scales have been added on job orientation and satisfaction, that are informed by social-cognitive 
career theory, and that were written specially for ELS:2002. 

Family formation. The interview collected infonnation about family life and civic 
engagement (including both voting and community service). With respect to family, the 
questions determine marital status, whether the respondent has children, and members of the 
household. 

Life values. As included in earlier rounds of ELS:2002 as well as in some of the prior 
NCES secondary longitudinal studies, questions are asked about the life values (acquisition of 
money, friendships, helping others, a good marriage, etc.) that are important to the respondent. 

Additional topics. Additional topics include civic engagement, assets/debt, and 
certification licensure. 

The third follow-up questionnaire is available at several levels. First, and simplest, the 
ELS:2002 homepage on the NCES website includes a facsimile of the questionnaire without 
routing logic. A facsimile of the questionnaire with routing logic appears in appendix C of this 
document. Finally, a flow chart of the questionnaire may be found in appendix D of this 
document. 

2.4 Third Follow-up Abbreviated Questionnaire 

In part because of sensitivity to the time between the second and third follow-ups and 
sensitivity to the busy lives of the study participants, to give response rates an added boost, an 
abbreviated questionnaire was offered — in place of the full-length questionnaire — for the final 4 
weeks of third follow-up data collection (early January through early February 2013). At this late 
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point in data collection, this abbreviated questionnaire was completed by approximately 1 
percent of participants eligible for the third follow-up. In creating the abbreviated questionnaire, 
a subset of full-length questionnaire items was selected such that (1) the overall average length 
of the abbreviated questionnaire would be approximately 10 minutes (as opposed to the -35- 
minute full-length questionnaire), and (2) questionnaire items necessary for a successful 
postsecondary transcript data collection (e.g., names of attended postsecondary schools) and 
other items of key analytic importance would be retained. Items retained for the abbreviated 
questionnaire included those regarding current activities, high school completion (if unknown or 
incomplete as of the second follow-up), postsecondary schools attended, postsecondary 
credentials earned, educational expectations, current/most recent job type (including rate of pay 
at said job), past/present unemployment status, expected age-30 occupation, family fonnation 
(marital status and biological children), 2011 employment earnings, and current debt. 

The format of abbreviated questionnaire items (in tenns of question wording, response 
options, etc.) were identical to the format of those same items as they appeared in the full-length 
questionnaire. To see the full set of abbreviated questionnaire items, please see items marked as 
such (by a triple asterisk — ***) in appendix C (facsimile of the third follow-up questionnaire) of 
this report. 

Administrative records linkages will be implemented in the course of the postsecondary 
transcript collection (for example, SAT and ACT scores). Infonnation from GED, Central 
Processing System/ Free Application for Federal Student Aid, and National Student Loan Data 
System are included in the current data delivery and are described in detail in appendix K. 
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Sample Design 

3.1 Base-year, First Follow-up, and Second Follow-up Sample 
Design 

3.1.1 Overview 

This chapter describes the Education Longitudinal Study of 2002 (ELS:2002) base-year 
to third follow-up sample designs. 

Section 3.1 provides a historical summary of sample design issues for the base year, first 
follow-up, and second follow-up. Section 3.2 provides an expanded discussion of the sample 
design in the context of the ELS:2002 third follow-up in 2012. 

3.1.2 Base-year Sample Design 

Full information on the base-year sample design can be found in chapter 3 and appendix J 
of the Base-year Data File User’s Manual (Ingels et al. 2004, NCES 2004-405). 

The ELS:2002 base-year sample design comprises two primary target populations — 
schools with 10th grades and sophomores in those schools — in the spring term of the 2001-02 
school year. ELS:2002 used a two-stage sample selection process. First, schools were selected. 
These schools were then asked to provide sophomore enrollment lists, from which students were 
selected. 

Schools and students are the study’s basic units of analysis. School-level data reflect a 
school administrator questionnaire, a library media center questionnaire, a facilities checklist, 
and the aggregation of student data to the school level. Student-level data consist of student 
questionnaire and assessment data and reports from students’ teachers and parents. (School-level 
data, however, can also be reported at the student level and serve as contextual data for students.) 

The target population of schools for the ELS:2002 base year consisted of regular public 
schools, including state Department of Education schools and charter schools, and Catholic and 
other private schools that contained 10th grades and were in the United States (the 50 states and 
the District of Columbia). 

The sampling frame of schools was constructed with the intent to match the target 
population. However, selected schools were determined to be ineligible if they did not meet the 
definition of the target population. Responding schools were those schools that had a Survey Day 
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(i.e., data collection occurred for students in the school). 8 Of the 1,268 sampled schools, there 
were 1,221 eligible schools and 752 responding schools (68 percent weighted participation rate). 

A subset of most but not all responding schools also completed a school administrator 
questionnaire and a library or media center questionnaire (99 percent and 96 percent weighted 
response rates, respectively). Most nonresponding schools or their districts provided some basic 
information about school characteristics, so that the differences between responding and 
nonresponding schools could be better understood, analyzed, and adjusted. Additionally, RTI 
field staff completed a facilities checklist for each responding school. 

The target population of students for ELS:2002 consisted of spring-term sophomores in 
2002 (excluding foreign exchange students) enrolled in schools in the school target population. 
The sampling frames of students within schools were constructed with the intent to match the 
target population. However, selected students were determined to be ineligible if they did not 
meet the definition of the target population. Of the 19,218 sampled students, there were 17,591 
eligible sophomores. The 15,362 participants on the public-use file represent a weighted student 
response rate of 87 percent. 

The ELS:2002 base-year survey instruments comprised two assessments (reading and 
mathematics) and a student questionnaire. Participation in ELS:2002 was defined by 
questionnaire completion. Although most students were asked to complete the assessment battery 
in addition to the questionnaire, there were some cases in which a student completed the 
questionnaire but did not complete the assessments. Guidelines were provided to schools to assist 
them in detennining whether students would be able to complete the ELS:2002 survey 
instruments. 

Students who could not complete the ELS:2002 questionnaire (by virtue of limited 
English proficiency or physical or mental disability) were part of the expanded sample of Spring 
2002 sophomores who were followed in the study and eligibility status was reassessed 2 years 
later. There were 163 such students. To obtain additional information about their home 
background and school experiences, contextual data were collected from the base-year parent, 
teacher, and school administrator surveys. 

The student sample was selected, when possible, in the fall or early winter so that sample 
teachers could be identified and materials could be prepared well in advance of Survey Day. 
However, selecting the sample in advance meant that some students transferred into the sample 
schools and others left between the time of sample selection and Survey Day. To address this 
issue, sample updating was conducted closer to the time of data collection. Complete enrollment 
lists were collected at both the time of initial sampling and the time of the sample update. 


’ One eligible school had no eligible students selected in the sample. This school was considered a responding school. 
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One parent of the sample student and English and mathematics teachers of the sample 
student were also included in the base-year sample. 

3.1.3 First Follow-up Sample Design 

The first follow-up collected infonnation for eligible base-year participants and also 
during the first follow-up the sample was freshened 9 to be representative of students enrolled in 
12th grade during the spring term of the 2003-04 school year. The first follow-up fielded sample 
comprised 16,763 sample members (16,525 eligible from the base-year collection; 238 identified 
as part of sample freshening). Figure 2 shows the distribution of the sample members over time. 
For additional details, see the Base-year to First Follow-up Data File Documentation (Ingels et 
al. 2005, NCES 2006-344). 

3. 1.3.1 High School Transcript Study Sample Design 

In the fall of 2004, high school transcripts were requested for all sample members who 
participated in at least one of the first two student interviews: the base-year interview or the first 
follow-up interview. Thus, sample members who were dropouts, freshened sample members, 
transfer students, homeschooled students, and early graduates are included if they were 
respondents in either the 2002 or 2004 interview. Transcripts were also requested for students 
who could not participate in either of the interviews because of a physical disability, a mental 
disability, or a language barrier. Further information about the transcript component may be 
found in Bozick et al. 2006 (NCES 2006-338), available to licensed users of the transcript data. 

3.1.4 Second Follow-up Sample Design 

The basis for the ELS:2002 second follow-up sampling frame was the sample of students 
selected in the base year when they were lOth-graders in 2002 combined with the sample of 
freshened students who were in the 12th grade in 2004. 

The second follow-up included all first follow-up eligible sample members except 
deceased students, students who were determined to be study ineligible at prior rounds, and 
sample members who were out of scope 10 in the first follow-up study. Eligible sample members 
who had not responded in the base year and the first follow-up were not fielded for the third 
follow-up. Similarly, freshened sample nonrespondents were not fielded for the third follow-up. 
Therefore, the second follow-up fielded sample consisted of 16,352 sample members. Figure 2 
shows the distribution of sample members over time. 


9 

For more infonnation on sample freshening, please see the Base-year to First Follow-up Data File Documentation (Ingels et al. 
2005, NCES 2006-344). 

10 Out-of-scope sample members include individuals who had not responded in any prior round, prior-round respondents who 
were incarcerated or out of the country, and prior-round study refusals. 
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For additional details, see the Base-year to Second Follow-up Data File Documentation 
(Ingels et al. 2007, NCES 2008-347). 

Figure 2. ELS:2002 third follow-up full-scale sample: 2012 



NOTE: Permanently out-of-scope cases include study withdrawals, deceased, study ineligibles, and some individuals who did not 
respond in the prior two rounds. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Base Year, First Follow-up, Second Follow-up, and Third Foliow-up. 

3.2 Third Follow-up Sample Design 

No additional sampling was performed for the third follow-up. The target populations for 
the third follow-up are the same as those in the first and second follow-ups; namely, those 
students who were enrolled in the 10th grade in 2002 and those students who were enrolled in 
the 12th grade in 2004. 

Figure 2 shows the distribution of the 17,754 eligible students sampled from 
approximately 750 schools in the base year plus the 238 students added during freshening in the 
first follow-up. (Some 1,464 of the 19,218 base-year sample members were found to be 
ineligible thus leading to 17,754 eligible base-year sample members). Eligible sample members 
who had not responded in the second follow-up and in the first follow-up were not fielded for the 
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third follow-up. The third follow-up sample consisted of 16,352 sample members fielded for the 
second follow-up excluding 176 individuals who were out of scope 11 for the third follow-up. A 
total of 16,176 sample members were fielded for the third follow-up. 


1 1 Approximately one-third of the out-of-scope individuals were deceased, one-third were questionnaire-incapable or otherwise 
incapacitated, and one-third were a mix of study refusals, individuals who had not responded in the prior two rounds, and those 
who were institutionalized or incarcerated. 
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Chapter 4. 

Data Collection Methodology, Results, and 
Response Rates 

The first two sections of this chapter present information on the data collection 
methodology and results. Next the chapter presents the responsive design methodology that was 
used in the third follow-up of the Education Longitudinal Study of 2002 (ELS:2002) to better 
target nonresponse follow-up efforts. Finally, this chapter presents infonnation on response rates. 

More detailed accounts of the base-year, first follow-up, and second follow-up data 
collections can be found in the following National Center for Education Statistics (NCES) 
publications: 

• Education Longitudinal Study of 2002: Base-Year Data File User ’s Manual (Ingels et 
al. 2004; NCES 2004-405); 

• Education Longitudinal Study of 2002: Base-Year to First Follow-up Data File 
Documentation (Ingels et al. 2005; NCES 2006-344); 

• Education Longitudinal Study of 2002: First Follow-up Transcript Component Data 
File Documentation (DFD) 12 (Bozick et al. 2006; NCES 2006-338); and 

• Education Longitudinal Study of 2002: Base-Year to Second Follow-up Data File 
Documentation (including Field Test report) (Ingels et al. 2007; NCES 2008-347). 

4.1 Third Follow-up Data Collection Methods 

This section details the data collection activities and procedures followed, including 
sample maintenance, tracing, responsive design methodology, survey modes, and refusal 
conversion. Section 4.2 provides quantitative information on the data collection outcomes. 

4.1.1 Locating and Tracing Activities 

Several locating methods were used to find and collect up-to-date contact information for 
the ELS:2002 third follow-up sample (figure 3). Batch searches of national databases and 
address update mailings to sample members and a parent were conducted prior to the start of data 
collection. Follow-up locating methods were employed for those sample members not found 
after the start of data collection (figure 4), including computer-assisted telephone interview 
(CATI) locating, computer-assisted personal interview (CAPI) field tracing, and intensive 
tracing. 


12 

The transcript DFD report (NCES 2006-338) is available only to licensed users of the transcript data; however, substantial 
attention is given to the transcript component in the Base-Year to Second Follow-up DFD (NCES 2008-347) as well. 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 


25 




Chapter 4. Data Collection Methodology, Results, and Response Rates 


Figure 3. Overview of the sample tracing process 




NOTE: CATI = computer-assisted telephone interview. SMs = sample members. 


Batch Tracing. Batch database searches were conducted on all sample members to 
update contact infonnation in preparation for mailing activities. These searches used the U.S. 
Department of Education’s Central Processing System (CPS) and U.S. Postal Services (USPS) 
National Change of Address databases. Then, just prior to the start of outbound telephone 
interviewing, all sample members were sent to Phone Append, which searches more than 400 
million landline, voice over Internet protocol, and wireless numbers in the United States, Puerto 
Rico, and Canada. All infonnation obtained from these sources was compared with the 
infonnation previously available from the ELS:2002 third follow-up locator database to identify 
any new contact information. 

Mailings. An initial letter announcing the data collection was mailed to sample members 
beginning on July 3, 2013, by USPS first-class mail. The mailing to all sample members 
included the following: 

• a letter, signed by both the project director and the NCES project officer, that announced 
the start of data collection; 

• infonnation about the incentive; 

• a link to the study website; 

• login credentials for accessing the web interview; 

• the ELS: 2002 e-mail address and toll-free help desk number; and 

• a brochure about ELS:2002 third follow-up. 
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Figure 4. Tracing process during data collection: 2012 
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NOTE: CATI = computer-assisted telephone interviewing. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 

In addition to the letter, an e-mail was sent to sample members 2 days later. Both the 
letter and the e-mail encouraged sample members to complete the survey online during the early 
response period. Additional mailings during this early period included a reminder postcard and 
letter and additional e-mail reminders to encourage early interview response. Once outbound 
telephone interview efforts began, periodic reminder mailings as well as weekly e-mails were 
sent to sample members throughout the course of data collection. 
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Approximately 3 weeks after the sample member letter was mailed, a letter announcing 
the start of data collection was also sent to one parent for each sample member. Two to 3 days 
later an e-mail was sent to the parent, if any e-mail address was known. A letter was also sent to 
parents of nonrespondent sample members in December 2012 so that parents could pass along 
the study information and reminder to the sample members. 

CATI Locating and Pre-intensive Tracing. Once outbound telephone interviewing 
began, telephone interviewers (TIs) conducted limited tracing and locating activities, as needed. 
The telephone number believed to be the best known number for contacting the sample member 
was attempted first. If the sample member could not be reached at that number after several 
attempts, any other numbers associated with the sample member, including parent and other 
contacts, were called. The interviewers attempted to gather locating infonnation for the sample 
member from the contact who answered the call. If the sample member could not be located, the 
case was sent for intensive interactive tracing. 

Intensive Tracing. Cases that could not be located through batch tracing or CATI 
locating efforts underwent intensive tracing. Using a number of public domain and proprietary 
databases, intensive tracing personnel uses a two-tiered strategy. The first tier used Social 
Security numbers (SSNs) to search for sample members in consumer databases such as 
FastData’s SSN search and Experian, which contains current address and telephone listings for 
the majority of consumers with credit histories. If a search generated a new telephone number for 
the sample member, tracers attempted to confirm the infonnation by speaking with the sample 
member or with someone else who could confinn the infonnation. If the number was confirmed, 
the case was sent back to CATI for telephone interviewing. This first level of effort minimized 
the time that cases were in tracing and unavailable for CATI efforts. 

If cases were not located (or locating infonnation not confirmed) at the end of the first 
level of intensive tracing, the cases underwent a more intensive level of tracing, which included 
calls to other possible sources of infonnation, such as directory assistance, alumni offices, and 
contacts with neighbors or landlords. Whenever any of these sources provided information that 
indicated that a sample member was not available for the study (e.g., deceased, incarcerated, or 
out of the country), no further contact efforts were made. 

4.1.2 Interviewing 

Data collection for ELS:2002 third follow-up consisted of the following phases: 

• Pre-CATI. This phase began with the first mailings occuning on July 3, 2012, and 
lasted until August 5, 2012. Sample members were encouraged to complete the 
survey over the Web during this phase. The telephone interview was available to 
sample members who contacted the help desk, but no outbound telephone calls were 
made. Sample members who completed the interview were eligible to receive an 
incentive of $25. 
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• Sample members who ever dropped out of high school were mailed to in the same period 

but their cases went straight into CATI production (in early July) and did not have a pre- 

CATI phase. These sample members were offered $55 throughout the data collection 

period. 

• CATI production and increased incentives for targeted cases. The CATI production 
phase began on August 6, 2012. During this phase, interviewers called to encourage 
sample members to complete the interview by telephone or on the Web. Sample 
members who completed the interview in this phase were eligible to receive an 
incentive of $25 or $55 based on their response propensity group assignment (see 
section 4.3 for more information on the methodology). The sample members were 
divided into two groups, high propensity and low propensity. Sample members in the 
high-propensity group were offered $25 while the low-propensity cases were offered 
$55. Sample members were alerted with a letter and e-mail, and by interviewers if 
successful contact was made. 

• Consideration for field interviewing and increased incentives for targeted cases. The 
increased incentive phase began on September 21, 2012. During this phase, newly 
identified low-propensity cases were offered an increased amount of $55. Sample 
members were alerted with a letter and e-mail, and by interviewers if successful 
contact was made. Low-propensity cases from the current selection or the prior 
selection were considered for CAPI field work. Cases that were deemed a good fit for 
field work were assigned to field interviewers (FIs) beginning September 27, 2012. 

• Prepaid $5 incen tive express mailing and increased incen tives for targeted cases. 

This phase began on November 6, 2012. During this phase, the incentive value was 
increased to $55 for newly identified low-propensity cases. Sample members were 
alerted with a letter and e-mail, and by interviewers if successful contact was made. 
Low-propensity cases received their letter via express mail and it included a $5 check 
in advance for completing the interview. All other cases received a reminder letter in 
a 9x12 envelope via USPS and it did not include a prepaid check. 

Sample members could complete the interview on the Web or by telephone throughout 
the entire data collection period. For the TI and FI, the interview screens were identical to those 
in the web interviews completed by respondents, except that instructions on how to administer 
each question were visible at the top of each screen for telephone and field interviews. Following 
are details of the administration of the interview through the various modes. 

An additional method used to increase the number of interview completions was to offer 
an abbreviated interview. Beginning on January 7, 2013, all nonrespondent sample members 
were eligible for the approximately 10-minute abbreviated interview and the full-length 
interview was no longer accessible. Sample members were notified via a letter and e-mail. The 
letter was sent via an express shipping service. 

Study Website and Help Desk. Sample members were provided with a link to the 
ELS:2002 third follow-up website prior to the start of data collection. The website provided 
general infonnation about the study, including the study sponsor and contractor, how the data are 
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used, answers to frequently asked questions (FAQs), confidentiality assurances, and selected 
findings from earlier rounds of ELS:2002. The website also provided contact information for the 
study help desk and project staff at RTI, as well as a link to the NCES website. Sample members 
were able to log in to the secure website to provide updated contact infonnation and complete the 
sample member interview once it became available. 

Designed according to NCES web policies, the study website used a three-tier security 
approach to protect all data collected. The first tier of security included secure logins, with a 
unique study ID and strong password provided to sample members. The second tier of security 
protected any data entered on the website with secure socket layer technology, allowing only 
encrypted data to be transmitted over the Internet. The third tier of security stored any collected 
data in a secured SQL Server database located on a server machine that was physically separate 
from the web server. 

Sample members were provided with a toll-free telephone number, which was answered 
by help desk agents. Help desk staff were available to sample members who had questions or 
technical issues related to completion of the web interview. For each call received, staff 
confirmed contact infonnation for the sample member and recorded a description of the problem 
and resolution. If technical difficulties prevented sample members from completing the web 
interview, help desk staff were able to complete a telephone interview. Two common types of 
help desk incidents were requests for login credentials and requests to complete the interview 
over the telephone. To minimize the need for help desk assistance, a “Forgot Password?” link 
was included on the study website and the need to disable pop-up blockers to launch the survey 
was eliminated. 

Web Interviews. Sample members were informed of the web interview in all project 
communications including mailings, e-mails, and telephone contacts. During the early response 
period (the first 4 weeks of data collection), only web interviews were completed unless sample 
members initiated a telephone interview by calling the help desk or sending an e-mail asking to 
be called. Dropout cases were an exception; they began receiving CATI calls a few days after 
their initial mailing. Reminder mailings and e-mails were sent throughout all phases of data 
collection to encourage sample members to complete the interview online. The website was 
accessible 24 hours a day, 7 days a week, throughout the data collection period, providing 
sample members with the option to complete the interview online at any time. 

Telephone Interviews. For sample members who ever dropped out of high school, 
outbound telephone follow-up began in early July. For all other cases, outbound telephone 
follow-up began on August 6, 201 1, after the 4-week pre-CATI response period ended. TIs 
attempted to locate, gain cooperation from, and interview sample members who had not yet 
completed the interview. Interviewers encouraged sample members to complete the interview by 
telephone; however, the web survey remained available throughout data collection for sample 
members who preferred that option. Sample members who did express a preference to complete 
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a web interview were called back 5 days later for follow-up if the interview had not yet been 
completed. 

The CATI Case Management System (CATI-CMS) included an automated call scheduler 
that assigned cases to interviewers by case priority, time of day, day of week, existence of 
previously scheduled appointments, and type of case. Case assignment was designed to 
maximize the likelihood of contacting and interviewing sample members, and cases were 
assigned to various queues accordingly. For example, the CATI-CMS included queues for new 
cases that had not been called, Spanish-language cases, initial refusals, and various appointment 
queues. In addition, available telephone numbers for each case were automatically prioritized for 
the interviewers. As new telephone numbers were added — as a result of CATI tracing, other 
tracing efforts, and information from other sources such as respondent e-mails or help desk call- 
ins — available telephone numbers were reprioritized based on the new information. 

Some cases required special treatment. To gain cooperation from those sample members 
who initially refused to participate (and from contacts such as parents and roommates who acted 
as gatekeepers to the sample member), interviewers were trained in refusal conversion 
techniques. For example, a TI will recontact the person and acknowledge the issue or concern the 
person conveys and listen carefully. He or she will then speak to the specific concern and also 
explain the importance of the study and the importance that individuals similar to the sample 
member are represented in the data. 

Quality control measures in CATI. In addition to training and certification procedures, a 
number of procedures were implemented to ensure and maintain data quality in CATI 
interviewing. Supervision and monitoring were maintained throughout data collection. 
Supervisors and monitors attended training alongside CATI staff so that they were familiar with 
the CATI interviewing procedures. 

Project staff regularly monitored telephone interviews during data collection to meet the 
following data quality objectives: 

• identification of problem items in the interview; 

• reduction in the number of interviewer errors; 

• improvement in interviewer performance through reinforcement of effective 
strategies; and 

• assessment of the quality of the data collected. 

Staff monitored interviews on all shifts, including live and recorded interviews, using 
RTFs monitoring interface, QUEST. Interview monitors recorded their feedback on standardized 
monitoring fonns that covered such topics as interviewer professionalism, question 
administration, and knowledge of the instrument. Interviewers received feedback from the 
session, and Quality Circle (QC) meetings frequently incorporated issues identified during 
monitoring to improve the overall quality of telephone interviews. 
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Quality Circle meetings. Weekly QC meetings were conducted to serve as an essential 
feedback loop for ensuring that project staff and call center staff were communicating on a 
regular basis about the goals of the study and addressing challenges encountered along the way. 
These meetings provided a forum for discussing elements of the instrument design and interview 
cooperation tactics, motivating the group toward the goals of the study, and obtaining feedback 
on data collection issues. Weekly QC meetings for telephone staff were held at the call center. 
Issues discussed at these meetings were documented in meeting notes and stored where staff 
could access them as needed. The interviewers were informed of counts of interview completions 
to date, general data collection issues and issues specific to the survey instrument, and project 
staff responses to questions from interviewers. 

Throughout the study, a variety of issues were addressed at the QC meetings that 
reinforced specific content from training and contributed to prompt problem solving. Some of the 
issues covered in these meetings included the following: 

• clarification of questions and item responses and reinforcement of positive interviewing 

techniques; 

• methods of gaining cooperation from sample members and gatekeepers (e.g., spouses, 

parents, and roommates); 

• problem sheets submitted during contacting and interviewing; 

• the importance of providing and reviewing detailed case comments; 

• data security protocols; and 

• study progress and general morale boosting. 

TIs and their supervisors were given the opportunity to ask questions in QC meetings, 
and as needs were identified, additional training topics were highlighted and addressed in 
subsequent meetings. 

Field Interviews. Field interviewing assignments began on September 27, 2012. FIs 
attempted to locate, gain cooperation from, and interview sample members who had not yet 
completed the interview. Interviewers encouraged sample members to complete the interview 
either in person or by telephone; however, the web survey remained available throughout data 
collection for sample members who preferred that option. Sample members who did express a 
preference to complete a web interview were called back 5 days later for follow-up if the 
interview had not yet been completed. 

Case assignment was designed to maximize the likelihood of contacting and interviewing 
sample members, based on factors such as location of the last known address of the sample 
member and feasibility of sending a traveling FI. 

Quality control measures in CAPI. Like CATI efforts, CAPI data collection included 
multiple procedures to ensure that data quality standards were being maintained. The CAPI 
manager and field supervisors closely monitored CAPI production on a daily basis so that they 
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could quickly address production issues and other field data collection challenges. Field 
supervisors held weekly conference calls with each of their FIs to discuss the status of each 
assigned case and ensure that appropriate efforts were being made for each case. During these 
calls, particular emphasis was placed on handling refusal cases and detennining appropriate steps 
for locating cases. The CAPI manager also held weekly conference calls with each field 
supervisor to discuss field production and strategies and to communicate any updates on data 
collection plans. 

To maintain control of quality in CAPI data collection, verification interviews were 
conducted for a sample of each FI’s completed interviews. At the end of each CAPI interview, 
respondents were told that they might be contacted for quality control purposes. Verification 
calls and interviews were completed by in-house TIs. 

Completed CAPI interviews were sampled randomly within interviewer but care was 
taken to ensure that at least 10 percent of each interviewer’s completed interviewers were 
verified. The verification interview included a brief set of questions about the procedures 
followed during the original interview, including the date on which the interview occurred, the 
mode in which the interview was completed (by telephone or in person), the approximate 
duration of the interview, and the amount of the incentive paid. In addition, two key factual 
questions from the interview were asked again in the verification interview. Any problems 
detected through verifications were coded and reported to the CAPI manager. The CAPI 
manager and field supervisors ensured that issues were addressed with the field staff member in 
a timely manner. 

4.1. 2.1 Training of Data Collectors 

The in-house data collection staff included quality control supervisors (QCSs), quality 
experts (QEs), TIs, and intensive-tracing staff. Prior to beginning work on ELS:2002 third 
follow-up, all data collection staff completed a comprehensive training program. Topics covered 
in the training program included the following: 

• a review of confidentiality requirements; 

• an overview of the study; 

• FAQs; 

• administrative procedures for case management; and 

• hands-on practice. 

The training program was designed to maximize active participation of the trainees. Help 
desk support staff were trained on July 5, 2012, and interviewers were trained on help desk 
support and interviewing in two sessions before and during data collection. A total of 63 help 
desk staff/TIs were trained. The specific roles and duties of data collection staff are summarized 
in the following subsections, along with a description of the training program. 
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Quality Control Supervisors and Quality Experts. QCSs provided support and 
guidance for the TIs and helped troubleshoot problems, and QEs monitored interviewer 
production. They attended ELS:2002 third follow-up project supervisor training and also 
participated in TI project training. The project supervisor training included an overview of the 
study, active listening techniques expected of interviewing staff, problem resolution, and other 
specific project procedures and protocols. 

Telephone Interviewers. As a primary point of contact with sample members, TIs were 
responsible for gaining cooperation from and conducting interviews with sample members, 
avoiding interview refusals, and addressing the concerns of reluctant sample members. Because 
of this integral role in the study, TIs received a project manual and 12 hours of training that 
included 

• an overview of ELS:2002 third follow-up; 

• an in-depth review of the questionnaire; 

• hands-on practice administering the telephone interview; 

• review of interviewing techniques; and 

• review of proper methods in handling inbound calls. 

To conclude the training and verify that the material had been learned, all TIs were certified by 
successfully conducting mock telephone interviews and by providing satisfactory responses to 
the study’s FAQs. 

In addition to the training described above, QCSs and TIs were also cross-trained as help 
desk agents who assisted any sample members who had questions or problems while completing 
web interviews or called in to complete the telephone interview. 

Tracing Staff. Tracing staff (tracers) used intensive measures, described in section 4.1.1, 
to locate sample members who lacked good telephone contact information. The 23 tracers 
attended a comprehensive 16-hour training session led by RTI tracing managers and covered all 
tracing procedures. Many ELS:2002 third follow-up tracers were cross-trained as TIs on the 
same project. 

Additionally, weekly QC meetings were routinely conducted as an extension of the 
training program for continual quality improvement. QC meetings are discussed in greater detail 
earlier in this section. 

Field Interviewers. In addition to the in-house staff described above, off-site FIs were 
also employed as data collectors. As a primary point of contact with sample members, FIs were 
responsible for gaining cooperation from and conducting interviews with sample members, 
avoiding interview refusals, and addressing the concerns of reluctant sample members. Because 
of this integral role in the study, the 44 FIs received a project manual and 30 hours of training 
that included 
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• an overview of ELS:2002 third follow-up; 

• an in-depth review of the questionnaire; 

• hands-on practice administering the telephone interview; 

• review of interviewing techniques; and 

• specifics of the laptop computer and CMS. 

To conclude the training and verify that the material had been learned, all FIs were certified by 
successfully conducting mock interviews; providing satisfactory responses to the study’s FAQs; 
appropriately coding postsecondary fields of study, institutions, and occupations; and accurately 
completing an exercise on selecting and keying event codes. 

4.1. 2.2 Refusal Conversion 

Another important challenge in planning data collection was developing procedures for 
sample members who initially refuse to participate or who otherwise prove difficult to include in 
the study. The design of the incentive plan considered factors likely to increase the difficulty of 
including certain sample members, such as those who did not participate in the previous round 
and those who had dropped out of high school. The incentive plan was intended to reduce the 
potential for sample members to hesitate or refuse to participate when first contacted about the 
data collection. Because the incentive plan could not avert initial hesitation or refusal among all 
sample members nor address all reasons for hesitation or refusal, procedures were needed to 
overcome hesitation and avoid refusals among sample members. In addition, because not all 
refusals or difficult situations could be avoided, contingency procedures were needed to address 
initial refusals or other difficult-to-complete cases during data collection. This section describes 
the procedures in place to avoid refusals, manage initial refusals, and handle other difficult 
situations. 

Procedures for avoiding refusals. Procedures for avoiding refusal situations included four 
activities: interviewer training sessions, web/CATI QC meetings, field conferences, and sample 
management. Efforts to avoid sample member refusals began in the training sessions of web help 
desk and telephone and field interviewing staff. Training modules addressed common reasons for 
reluctance or refusal, strategies to address potential refusal situations, and consideration of 
specific reluctance or refusal statements and behaviors. Presentations, discussion, role-playing 
exercises, and a team competition were all used to prepare interviewers to address potential 
refusal situations. These training modules included specific objections from sample members or 
parents and potential interviewer responses that were directly based on experiences from prior 
rounds of EFS:2002 and other current education studies of young adults. 

Another important focus of interviewer training was addressing potential reluctance or 
specific objections among gatekeepers. Experience from prior rounds of ELS:2002 demonstrated 
the ways in which parents and other household members can either help or hinder efforts to 
contact sample members. Training modules also focused discussion and exercises on how 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 


35 




Chapter 4. Data Collection Methodology, Results, and Response Rates 


interviewers can successfully address common gatekeeper concerns and objections. To assist 
CATI interviewers, the CATI-CMS program included scripted probes for interviewers to use 
when asking for parents’ or other contacts’ cooperation in reaching sample members. Because 
many sample members had completed high school and moved out of their parents’ household, 
gaining parent assistance in contacting sample members was often the first step in the survey 
participation process. 

These training sessions all followed the same general strategy for addressing reluctant 
sample members or gatekeepers, including the following: 

• understanding the reason(s) for the subject’s or gatekeeper’s reluctance as quickly as 

possible; 

• being prepared to address the concem(s) quickly and directly; 

• focusing responses on why the sample member’s participation is important to ELS:2002; 

and 

• using an effective tone and maintaining a professional approach. 

This strategy was illustrated through specific examples used in training modules. 

In addition, all interviewers were required to complete an exercise in responding to 
sample member or gatekeeper concerns as part of the certification process. This certification 
process reinforced using the refusal avoidance strategy to communicate the importance of sample 
members’ participation in ELS:2002. The most important points interviewers were trained to 
communicate to sample members and other contacts included: 

• reminding sample members and parents of their previous participation; 

• the importance of sample members’ continued participation; 

• the importance of ELS:2002 for education in the United States; 

• the incentive payment provided to sample members; 

• explanations of the 2012 data collection procedures and options; and 

• a toll-free number to talk with the data collection manager about the study. 

Key talking points and refusal avoidance strategies were regularly reinforced in QC 
meetings and field conference calls held with the help desk and interviewing staff. QC meetings 
and field conferences were held weekly during 2012 data collection to ensure that procedures 
were being followed correctly. The meetings provided a forum for interviewers to discuss 
specific examples of reluctance or refusal responses among sample members and possible steps 
to address these concerns. After the first few meetings focused on basic issues, data collection 
managers began to regularly add a session at the end of each meeting devoted to role-playing 
potential refusal situations. These sessions provided regular practice and discussion for 
interviewing staff so that they were prepared to address these situations effectively. 

Sample management activities in CATI data collection were also an important part of 
refusal avoidance procedures. Call scheduling procedures were designed to avoid inundating 
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households in the sample with too-frequent calls. For example, when answering machines were 
reached and a message left, the CATI-CMS call scheduling system held these cases for at least 3 
days to give sample members or their parents some time to return the call. When sample 
members or parents requested a callback for a specific day and time, interviewers entered these 
appointments in CATI-CMS so that the appointment cases would be delivered to interviewers at 
the appropriate time. Telephone supervisors were continually aware of the need to keep all 
appointments, and monitored the status of upcoming appointments to ensure that all 
appointments were covered. These procedures ensured that appointments were kept regularly, 
which was a significant issue for sample members and parents with busy schedules. Another 
important feature of CATI-CMS that provided assistance in avoiding refusal situations was the 
call history log. After each call, interviewers entered relevant infonnation about the results of the 
call and any interaction with sample members or other contacts in this log. These notes ensured 
that interviewers who made subsequent calls to contact sample members were aware of the 
results of previous calls. The call history log allowed interviewing staff to be sensitive to any 
concerns sample members, parents, or other contacts had about the 2012 data collection process 
and to be prepared to address those concerns in subsequent contacts. 

CAPI staff also used sample management techniques to avoid refusal situations. A key 
difference between CAPI and CATI was that field efforts to avert refusals were based on 
collaborations between field supervisors and FIs. FIs worked with their supervisors to develop an 
individual approach to each case based on the CATI call history for the case and the results of 
any initial contact attempts by the CAPI interviewer. Field staff maintained detailed 
documentation of contact attempts in an event log in the laptop CMS. FIs could then review this 
infonnation when planning future contacts with each sample member and, if the case was 
transferred to another interviewer, provide infonnation for those subsequent contact attempts. 

Procedures for converting refusals. Whenever a call resulted in a refusal, CATI 
interviewers followed a predetermined set of steps to classify the refusal situation. CATI-CMS 
produced a series of screens that allowed interviewers to specify the following information: 

• person who refused (sample member or other); 

• point at which the refusal occuned (prior to, during, or after the introduction); 

• strength of refusal (mild, firm, or hostile); and 

• any specific reasons mentioned for the refusal. 

Interviewing staff used both general strategies and the specific infonnation in CATI- 
CMS, including the call history log, to develop a refusal conversion approach to each individual 
case. After a call resulted in a refusal and the information about the interaction was entered, 
CATI-CMS moved the case to a special refusal queue. Cases in the refusal queue were held for 
at least 1 week after the initial refusal before being made available by the call scheduler for 
subsequent refusal conversion attempts. In practice, the time interval between the initial refusal 
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and the next contact was often longer than 1 week because multiple subsequent calls were often 
required to contact these sample members again. 

Delaying subsequent contact attempts was a key sample management procedure used to 
maximize the success of refusal conversion efforts. This break provided a short period to allow 
sample members to reconsider their participation in the study and, in some cases, to be in a more 
favorable situation to participate. For the same reason, CAPI interviewers who were assigned 
initial refusal cases from CATI data collection typically waited a week before attempting to 
contact these sample members. All refusal cases assigned to field data collection were reviewed 
carefully by the field supervisor and FI. Field staff would review the statements made by the 
prospective respondent or gatekeeper when he or she declined participation and develop a refusal 
conversion approach for each individual case. The approach of field staff sometimes included 
having the supervisor contact the household first or transferring the case to another CAPI 
interviewer when the original interviewer was unable to make progress with the case. 

Because data collection staff anticipated that a significant number of refusals would 
ultimately transpire, plans were made early in data collection to conduct specialized refusal 
conversion training sessions for telephone interviewing staff within a few weeks after the start of 
outbound CATI calling. The first refusal conversion training was conducted about 3 weeks after 
outbound CATI data collection, and a second training session was held 2 weeks later. For both 
training sessions, data collection staff selected interviewers with strong performance ratings to 
attend these trainings. Interviewers were also identified based on qualitative feedback from 
telephone supervisors and monitors. As a general rule, interviewers selected to be refusal 
conversion specialists were interviewers who had demonstrated skills in enlisting cooperation 
among sample members and avoiding initial refusals. The training sessions emphasized specific 
refusal conversion techniques tailored to the ELS:2002 sample of young adults, including 
overcoming objections, addressing concerns of gatekeepers, and providing alternatives for 
participation. Both group discussions and individual role-playing exercises were used in refusal 
conversion training. Only interviewers who had successfully completed one of these training 
sessions were allowed to call initial refusals. Table 12 in section 4.2 presents the results of 
refusal conversion efforts. 

4.2 Third Follow-up Data Collection Outcomes 

ELS:2002 third follow-up interviews were administered between July 4, 2012, and 
February 3, 2013. Of the 16,176 sample members, 15,724 were deemed to be in scope for the 
study after removing those who were ineligible (e.g., deceased) or out of scope for reasons such 
as being institutionalized, incarcerated, or out of the country. Of these eligible members, 13,250 
sample members (84 percent weighted and unweighted) completed a full interview or a partial 
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interview. 13 The average length of a full interview was 32.8 minutes; the average length of an 
abbreviated interview was 12.8 minutes. The results presented in the remainder of this section, 
including those shown in all tables, refer to unweighted rates. Section 4.4 contains weighted 
response rates. 

The overall locating and interviewing results for the ELS:2002 third follow-up data 
collection effort, including sample members who were located but later excluded, are presented 
in figure 5. A sample member was considered located if an interview was completed, the sample 
member was successfully contacted despite not completing the interview, or the contact 
information in the study database had not been proven out of date or incorrect during contact 
attempts. As will be illustrated later in this section, second follow-up nonrespondents were 
particularly challenging to locate. 

Figure 5. Sample member locating and third follow-up response status: 2012 



SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


13 ... . 

A partial interview is defined as any case where the respondent began the interview, completed it through the high school 
completion section, but broke off and did not return to complete the interview. 
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Locating and Interview Results. Overall locating status varied by high school dropout 
status and prior-round response status, as shown in table 1 . The unweighted locating rate was 
defined as the percentage of cases in which an interview was completed, the sample member was 
successfully contacted, or the contact information in the study database was not proven out of 
date or incorrect during contact attempts. The locate rate was significantly higher for those who 
responded to the ELS:2002 second follow-up compared to those who did not respond to the 
second follow-up. 

Table 1. Sample member locating in the third follow-up, by dropout status and second follow- 
up response status: 2012 


Located Responding sample members 


ELS:2002 dropout status and second 
follow-up response status 

Total 

eligible 

Number 

Percent of 
total 

Number 

Percent 
of located 

Percent 
of total 

Total 

15,724 

15,153 

96.4 

13,250 

88.8 

84.3 

Dropout status 

Ever dropped out 

1,762 

1,610 

91.4 

1,384 

87.4 

78.5 

Not ever dropped out 

13,962 

13,543 

97.0 

11,866 

88.9 

85.0 

Second follow-up response status 

Respondent 

13,837 

13,497 

97.5 

12,135 

91.0 

87.7 

Nonrespondent 

1,887 

1,656 

87.8 

1,115 

70.4 

59.1 


NOTE: Count includes sample members who completed enough of the interview to count as a respondent. For “ever dropped out,” 
classified as dropout if at least one of the following conditions was met after the second follow-up: school reported that respondent 
had dropped out of school at any one of the enrollment status updates, respondent earned a General Educational Development 
credential, respondent did not receive a high school credential, or respondent’s high school transcript indicates dropped out, 
dismissed, or incarcerated. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


A concerted effort was made to convince parents of the value of the study so that they 
would cooperate and share infonnation with their children. In some cases, parents continue to act 
as gatekeepers for the sample members even though the sample members are several years 
removed from being minors. Dual mailings to both the sample members and their parents were 
sent. These procedures ensured at both the panel maintenance and data collection stages that 
parents were aware of the plans for the third follow-up. Direct mail and e-mail contacts with 
parents allowed parents to provide updated contact information for their young adults. Also, 
because some phone numbers available for sample members are numbers for their parents’ 
homes, CATI procedures were in place to guide interviewers on how to appropriately ask for and 
record new contact information for sample members from parents. Successful contacts with 
parents were an important part of interviewer training. 

Locate Rates by Source of Batch Update. Before and during the ELS:2002 third 
follow-up data collection, batch matching was conducted to gather the latest contact information 
for the 16,176 sample members. Multiple records were sent for matching per case. This effort 
confirmed contact infonnation or provided new contact infonnation for thousands of records for 
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full-scale cases, as shown in table 2. The highest proportion of matches was obtained through 
FirstData Phone Append, 41 percent of records sent. CPS for the Free Application for Federal 
Student Aid (FAFSA) matching was completed late in the data collection period and only 
nonrespondents at the time were sent. In part because of which cases were sent, only 9.9 percent 
of records sent were matched through CPS. Further, because CPS relies on postsecondary 
student loan application data, it is unsurprising that it matched the fewest records because only 
74 percent of ELS:2002 second follow-up respondents reported attending postsecondary 
institution(s), and it had been 8 years since the majority of sample members completed high 
school, so the numbers attending postsecondary institutions were likely to be low. Also, not all 
postsecondary students fill out the application for federal student aid and those students would 
not appear in the CPS data. 

Table 2. Batch processing record match rates in the third follow-up, by tracing source: 2012 


Method of tracing 

Number of records sent 

Number of records matched 

Percent matched 

NCOA 

15,230 

710 

4.7 

Phone Append 

15,230 

6,281 

41.2 

Premium Phone 

8,637 

4,237 

49.1 

CPS 

5,962 

589 

9.9 


NOTE: Percent is based on the number of records sent for batch tracing. Because records were sent to multiple tracing sources, 
multiple record matches were possible. Match rate includes instances when sample member contact information was confirmed and 
when new information was provided. Numbers may not sum to total because of rounding. CPS = Central Processing System for the 
Free Application for Federal Student Aid. NCOA = National Change of Address. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 

Address Update Mailing Results. The three most recent panel maintenance address 
update mailings for the cohort, in which sample members and one parent were asked to confirm 
or update contact infonnation, were conducted in fall 2010, fall 2011, and Spring 2012. For the 
2011 mailing, a $10 incentive was offered to sample members if they or a parent updated or 
confirmed contact infonnation. As shown in table 3, address updates or confinnations were 
received from 4,709 sample members (29 percent unweighted) in response to the request, which 
was sent via mail and e-mail. For the 2010 and 2012 mailings, the response was also successful 
with address updates or confinnations being received from 3,993 sample members (25 percent 
unweighted) in response to the request in 2010 and 3,831 sample members (24 percent 
unweighted) in response to the request in 2012. In 2010, we began mailing not only to the 
sample member but also to one parent. A clear increase in response can be seen between 2010 
and the prior years. 
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Table 3. Panel maintenance participation rates for the ELS:2002 full-scale sample, by round of 
panel maintenance: 2012 


Responded 


Round of panel maintenance 

Sample 1 

Number 

Percent 

2012 

16,176 

3,831 

23.7 

2011 

16,176 

4,709 

29.1 

2010 

16,176 

3,993 

24.7 

2008 

16,197 

1,760 

10.9 

2007 

16,197 

1,998 

12.3 


1 The third follow-up sample consists of the second follow-up sample but excludes second follow-up sample members who were 
determined to be ineligible at the end of the second follow-up or during panel maintenance, such as deceased sample members, 
study-ineligible members, and other individuals determined to be permanently out of scope. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


Interview Completion by Address Update Participation. Sample members who 
responded to a panel maintenance address update request in 2010, 201 1, or 2012 responded to 
the third follow-up interview at a higher rate than those who did not respond to the request. 
Table 4 contains the third follow-up interview completion rates by panel maintenance address 
update response. 


Table 4. Interview completion rate in the third follow-up, by panel maintenance participation in 
2010, 2011, or 2012: 2012 



Number of eligible cases 

Interview respondent 
Number Percent 

Total 

15,724 

13,250 

84.3 

Panel maintenance respondent 

6,519 

6,348 

97.4 

Panel maintenance nonrespondent 

9,205 

6,902 

75.0 


NOTE: Count includes sample members who completed enough of the interview to count as a respondent. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


Table 5 shows combined panel maintenance participation (participation in 2010, 201 1, or 
2012) and of those respondents, how many completed the third follow-up interview. Of the 42 
percent of sample members who were panel maintenance participants (in 2010, 201 1, or 2012), 
97 percent completed the third follow-up interview. The table also contains panel maintenance 
response and interview response by sample member characteristics. For all but one characteristic, 
over 90 percent of the panel maintenance respondents completed the third follow-up interview. 
The exception was second follow-up nonrespondents; 84 percent of those panel maintenance 
respondents completed the interview. 
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Table 5. Interview completion rate in the third follow-up, by panel maintenance participation in 
2010, 2011, or 2012: 2012, broken down by sample member characteristics 

Interview respondent of 

Panel maintenance combined panel maintenance 

response combined respondents 


Percent of 


Characteristic 

Eligible 

Number 

eligible 

Number 

Percent 

Total 

15,724 

6,519 

41.5 

6,348 

97.4 

Sex 

Male 

7,744 

2,856 

36.9 

2,752 

96.4 

Female 

7,980 

3,663 

45.9 

3,596 

98.2 

Race/ethnicity 

Asian or Pacific Islander 

1,582 

633 

40.0 

613 

96.8 

Black or African American 

2,104 

502 

23.9 

484 

96.4 

Hispanic or Latino 

2,352 

655 

27.8 

641 

97.9 

Other/more than one race 

881 

333 

37.8 

322 

96.7 

White 

8,805 

4,396 

49.9 

4,288 

97.5 

Second follow-up response status 

Second follow-up respondent 

13,837 

6,295 

45.5 

6,159 

97.8 

Second follow-up nonrespondent 

1,887 

224 

11.9 

189 

84.4 

Dropout status 

Ever dropped out 

1,762 

400 

22.7 

382 

95.5 

Not ever dropped out 

13,962 

6,119 

43.8 

5,966 

97.5 

Regular high school diploma (not GED, 
certificate of completion) 

Received regular diploma 

14,028 

6,198 

44.2 

6.050 

97.6 

Did not receive regular diploma 

1,696 

321 

18.9 

298 

92.8 

Postsecondary attendance (if known) 

Attended postsecondary institution(s) 

12,527 

5,973 

47.7 

5,882 

98.5 

Did not attend postsecondary 

institution(s) 

2,405 

510 

21.2 

465 

91.2 


NOTE: Panel maintenance respondent count includes those who provided a panel maintenance update in 201 0, 201 1 , or 201 2. 
Interview respondent count includes sample members who responded to the ELS:2002 third follow-up survey. The “Other/more than 
one race” subgroup includes American Indian/Alaska Native cases. For “ever dropped out,” classified as dropout if at least one of 
the following conditions was met after the second follow-up: school reported that respondent had dropped out of school at any one 
of the enrollment status updates, respondent earned a GED, respondent did not receive a high school credential, or respondent’s 
high school transcript indicates dropped out, dismissed, or incarcerated. GED = General Educational Development credential. 
SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 

Sample Members Requiring Intensive Tracing. Overall, 2,897 (18 percent) of the 
15,724 eligible sample members required intensive tracing (table 6). Twenty-eight percent of 
sample members who ever dropped out of high school required intensive tracing, compared with 
17 percent of sample members who never dropped out. Forty- four percent of second follow-up 
nonrespondents required intensive tracing, compared with 15 percent of second follow-up 
respondents. Of the 2,897 traced, 2,561 were located (88 percent) and of these, 1,375 (54 
percent) completed an interview. 
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Table 6. Sample members requiring intensive tracing in the third follow-up, by dropout status 
and second follow-up response status: 2012 

Dropout status and second follow-up response Cases requiring intensive tracing 


status Total eligible Number Percent 


Total 

15,724 

2,897 

18.4 

Dropout status 

Ever dropped out 

1,762 

495 

28.1 

Never dropped out 

13,962 

2,402 

17.2 

Second follow-up response status 

Respondent 

13,837 

2,067 

14.9 

Nonrespondent 

1,887 

830 

44.0 


NOTE: Numbers may not sum to total because of rounding. Count excludes cases initiated to intensive tracing that were not traced. 
For “ever dropped out,” classified as dropout if at least one of the following conditions was met after the second follow-up: school 
reported that respondent had dropped out of school at any one of the enrollment status updates, respondent earned a General 
Educational Development credential, respondent did not receive a high school credential, or respondent’s high school transcript 
indicates dropped out, dismissed, or incarcerated. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


Interview Outcomes by Mode. ELS:2002 third follow-up interviews were completed on 
the Web, by CATI with a TI, or by CAPI with an FI. Table 7 shows that 36 percent of interviews 
were completed on the Web without telephone contacts, 37 percent of interviews were completed 
on the Web after calls by TIs, 19 percent were completed with a TI, and 8 percent were 
completed with an FI. Summed, 73 percent of interviews were completed on the Web. This is in 
line with the expected increased Internet use over time and the greater proportion of web 
interviews expected closer to the beginning of the data collection period. The ELS:2002 second 
follow-up produced only 37 percent of the completed interviews on the Web, although it is 
important to keep in mind that the third follow-up took place over a shorter time period than the 
time period for the second follow-up. 


Table 7. Distribution of respondents in the third follow-up, by mode of administration: 2012 


Mode of administration 

Respondents 


Number Percent of all respondents 

Total 

13,250 

100.0 

Web, without telephone calls 

4,786 

36.1 

Web, with telephone calls 

4,838 

36.5 

Telephone (CATI) interview 

2,573 

19.4 

Field (CAPI) interview 

1,053 

7.9 


NOTE: Detail may not sum to totals because of rounding. Count includes sample members who completed enough of the interview 
to count as a respondent. CAPI = computer-assisted personal interview. CATI = computer-assisted telephone interview. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


Table 8 shows the distribution of respondents by interview mode as well as 
characteristics including sex, race/ethnicity, socioeconomic status, second follow-up response 
status, and “ever high school drop-out” status. 
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Table 8. Distribution of respondents in the third follow-up, by select characteristics and mode: 
2012 




Web 


CATI 


CAPI 


Subgroup 

Total 

Number 

Percent 
of total 

Number 

Percent 
of total 

Number 

Percent 
of total 

Total 

13,250 

9,624 

72.6 

2,573 

19.4 

1,053 

7.9 

Sex 

Male 

6,271 

4,317 

68.8 

1,347 

21.5 

607 

9.7 

Female 

6,979 

5,307 

76.0 

1,226 

17.6 

446 

6.4 

Race/ethnicity 

Asian or Pacific Islander 

1,310 

1,093 

83.4 

143 

10.9 

74 

5.6 

Black or African American 

1,702 

928 

54.5 

543 

31.9 

231 

13.6 

Flispanic or Latino 

1,885 

1,198 

63.6 

410 

21.8 

277 

14.7 

Other/more than one race 

726 

516 

71.1 

146 

20.1 

64 

8.8 

White 

7,627 

5,889 

77.2 

1,331 

17.5 

407 

5.3 

Socioeconomic status (SES) 

Lowest quarter 

2,946 

1,701 

57.7 

733 

24.9 

512 

17.4 

Second quarter 

3,106 

2,120 

68.3 

717 

23.1 

269 

8.7 

Third quarter 

3,396 

2,622 

77.2 

596 

17.6 

178 

5.2 

Highest quarter 

3,788 

3,169 

83.7 

526 

13.9 

93 

2.5 

Second follow-up response status 

Second follow-up respondent 

12,135 

8,995 

74.1 

2,325 

19.2 

815 

6.7 

Second follow-up nonrespondent 

1,115 

629 

56.4 

248 

22.2 

238 

21.3 

Dropout status 

Ever dropped out 

1,384 

702 

50.7 

262 

18.9 

420 

30.3 

Not ever dropped out 

11,866 

8,922 

75.2 

2,311 

19.5 

633 

5.3 


NOTE: Count includes sample members who completed enough of the interview to count as a respondent. In this table, the 
“Other/more than one race” subgroup includes American Indian/Alaska Native cases. For SES status, 14 respondents were 
unclassified. For “ever dropped out,” classified as dropout if at least one of the following conditions was met after the second follow- 
up: school reported that respondent had dropped out of school at any one of the enrollment status updates, respondent earned a 
General Educational Development credential, respondent did not receive a high school credential, or respondent’s high school 
transcript indicates dropped out, dismissed, or incarcerated. CAPI = computer-assisted personal interview. CATI = computer- 
assisted telephone interview. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


Response by Phase of Data Collection. Thirty-one percent of ELS:2002 third follow-up 
interviews were completed during the pre-CATI phase. Fifty-four percent of interviews were 
completed during the CATI phase. Fourteen percent were completed during the CAPI phase. 
Section 4.1.2 contains details of the phases of data collection. 

Interview response, by phase of data collection, is shown in table 9. Each phase of data 
collection can be evaluated individually. The pre-CATI phase of data collection yielded 4,291 
responses, translating to a 3 1 percent completion rate during the pre-CATI phase (4,291 
completed of 13,984 pre-CATI cases) (table 9). The next phase of data collection, CATI, yielded 
a 62 percent completion rate (7,1 10 completed of 1 1,459 eligible for the second phase). The third 
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phase, CAPI, yielded a 63 percent completion rate (1,849 completed of 2,953 eligible assigned to 
the field). 

Table 9. Number of cases and percentage of completed interviews by data collection phase in 
the third follow-up: 2012 


Data collection phase 

Number of cases in phase 

Completed interviews 
Number Percent of cases 

Pre-CATI phase 

13,984 

4,291 


30.7 

CATI phase 

11,459 

7,110 


62.0 

Ever dropped out 

1,766 

732 


41.4 

Never dropped out 

9,693 

6,378 


65.8 

CAPI phase 

2,953 

1,849 


62.6 

Ever dropped out 

1,009 

652 


64.6 

Never dropped out 

1,944 

1,197 


61.6 


NOTE: Partial interviews were not included because partially completed interviews could be resumed by sample members through 
the end of data collection. Sample members who ever dropped out of high school all started data collection in the CATI phase of the 
third follow-up. The cases include 26 sample members found to be ineligible (e.g., deceased) or otherwise out-of-scope (e.g., out-of- 
country, incapacitated, incarcerated) during the data collection period. For “ever dropped out,” classified as dropout if at least one of 
the following conditions was met after the second follow-up: school reported that respondent had dropped out of school at any one 
of the enrollment status updates, respondent earned a General Educational Development credential, respondent did not receive a 
high school credential, or respondent’s high school transcript indicates dropped out, dismissed, or incarcerated. CAPI = computer- 
assisted personal interviewing. CATI = computer-assisted telephone interviewing. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 ELS:2002) 
Third Follow-up. 

Interview completeness, by dropout status and second follow-up response status, is 
shown in table 10. Respondents who ever dropped out completed a higher proportion of 
abbreviated interviews than full interviews compared to those who never dropped out (15.5 
percent versus 9.6 percent). In a similar pattern, respondents who did not complete the ELS:2002 
second follow-up interview completed a markedly higher proportion of abbreviated interviews 
than full interviews compared to those who did complete the ELS:2002 second follow-up 
interview (28.2 percent versus 8.8 percent). 

Table 10. Interview completeness in the third follow-up, by dropout status and second follow-up 
response status: 2012 


ELS:2002 dropout status and second Full interview Abbreviated interview 


follow-up response status 

Total 

Number 

Percent 

Number 

Percent 

Total 

13,250 

12,027 

90.8 

1,223 

10.2 

Dropout status 

Ever dropped out 

1,384 

1,198 

86.6 

186 

15.5 

Never dropped out 

11,866 

10,829 

91.3 

1,037 

9.6 

Second follow-up response status 

Respondent 

12,135 

11,157 

91.9 

978 

8.8 

Nonrespondent 

1,115 

870 

78.0 

245 

28.2 


NOTE: For “ever dropped out,” classified as dropout if at least one of the following conditions was met after the second follow-up: 
school reported that respondent had dropped out of school at any one of the enrollment status updates, respondent earned a 
General Educational Development credential, respondent did not receive a high school credential, or respondent’s high school 
transcript indicates dropped out, dismissed, or incarcerated. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 
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Telephone Interviewer Call Counts. During the course of ELS:2002 third follow-up 
data collection, 308,717 CATI telephone calls were completed. TI time was spent administering 
the interview as well as perfonning case management activities such as locating and contacting 
sample members, prompting sample members to complete interviews, reviewing call history, 
scheduling appointments for callbacks, and entering detailed comments and suggestions to assist 
with reaching and interviewing sample members. Some of the time was also spent responding to 
incoming help desk calls. 

On average, 20 calls were made per ELS:2002 third follow-up sample member during the 
data collection period. 14 Of the sample, cases that completed the third follow-up interview 
required an average of 15 calls, while third follow-up nonrespondents received an average of 47 
calls during the interviewing period. There were no significant differences in call counts between 
respondents who completed interviews over the telephone and respondents who completed 
interviews over the Web but with phone prompting. The average number of CATI telephone 
calls is shown in table 1 1 . 


Table 11. Number and average of CATI calls, by dropout status and prior- and current-round 
response status and mode of interview in the third follow-up: 2012 


Response status and mode 

Number of cases 

Number of calls 

Average number of calls 

Total 

15,735 

308,717 

19.6 

Dropout status 

Ever dropped out 

1,764 

17,446 

9.9 

Never dropped out 

13,971 

291,271 

20.9 

Second follow-up response status 

Respondent 

13,844 

255,868 

18.5 

Nonrespondent 

1,891 

52,849 

28.0 

Current-round response status 

Respondent 

13,250 

192,669 

14.5 

Web interview 

9,624 

107,260 

11.2 

Web with telephone calls 

4,838 

107,260 

22.2 

Telephone interview 

2,573 

61,809 

24.0 

Field interview 

1,053 

23,600 

22.4 

Nonrespondent or ineligible 

2,485 

116,048 

46.7 


NOTE: Count includes sample members who completed enough of the interview to count as a respondent. The cases (and the 
nonrespondent or ineligible category) include 1 1 sample members found to be ineligible (e.g., deceased) or otherwise out-of-scope 
(e.g., out-of-country, incapacitated, incarcerated) during the data collection period. For “ever dropped out,” classified as dropout if at 
least one of the following conditions was met after the second follow-up: school reported that respondent had dropped out of school 
at any one of the enrollment status updates, respondent earned a General Educational Development credential, respondent did not 
receive a high school credential, or respondent’s high school transcript indicates dropped out, dismissed, or incarcerated. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


14 This includes sample members who required no call attempts. 
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Averting and Converting Refusals. Refusal aversion and conversion techniques were 
integrated into TI training and were reinforced throughout data collection in QC meetings. 
Interviewers were encouraged to share their experiences gaining sample member cooperation 
and to seek guidance from the group. Refusal cases were placed in a separate queue and worked 
by a subset of interviewers selected for additional refusal conversion training. Overall, 12 
percent of eligible cases ever refused (i.e., refused and later completed an interview or refused 
and never completed an interview). Of these refusals, 47 percent subsequently completed the 
interview (table 12) after refusal conversion attempts were made. In the previous round (second 
follow-up), 13 percent of eligible cases ever refused and, of these, 8 percent subsequently 
completed the interview. 

Table 12. Refusal and refusal conversion rates in the third follow-up, by dropout status and 
second follow-up response status: 2012 


Ever refused interview Respondent, given refusal 


ELS:2002 dropout status and second 
follow-up response status 

Total 

cases 

Number 

Percent of 
total 

Number 

Percent of 
refused 

Percent 
of total 

Total 

15,750 

1,880 

11.9 

892 

47.4 

5.7 

Dropout status 

Ever dropped out 

1,766 

149 

8.4 

59 

39.6 

3.3 

Not ever dropped out 

13,984 

1,731 

12.4 

833 

48.1 

6.0 

Second follow-up response status 

Respondent 

13,849 

1,443 

10.4 

740 

51.3 

5.3 

Nonrespondent 

1,901 

437 

23.0 

152 

34.8 

8.0 


NOTE: Numbers may not sum to total because of rounding. Count includes sample members who completed enough of the 
interview to count as a respondent. The cases include 26 sample members found to be ineligible (e.g., deceased) or otherwise out- 
of-scope (e.g., out-of-country, incapacitated, incarcerated) during the data collection period. For “ever dropped out,” classified as 
dropout if at least one of the following conditions was met after the second follow-up: school reported that respondent had dropped 
out of school at any one of the enrollment status updates, respondent earned a General Educational Development credential, 
respondent did not receive a high school credential, or respondent’s high school transcript indicates dropped out, dismissed, or 
incarcerated. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


Field Interviewing. During the course of ELS:2002 third follow-up data collection, 

1,849 CAPI interviews were completed. CAPI interviews were conducted in person and by 
phone, and sample members were still able to complete the web interview or call the help desk to 
complete a telephone interview once a case was sent to the field. Table 13 contains located and 
completed field interviews by dropout status and second follow-up response status. 
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Table 13. Located and completed field interview cases in the third follow-up, by dropout status 
and second follow-up response status: 2012 


ELS:2002 dropout status and 
second follow-up response 
status 

Eligible 
cases 
sent to 
field 

Located 

Percent 
of total 

Responded 

Percent of 
total 

Percent 

of 

located 

Total 

2,940 

2,533 

86.2 

1,849 

62.9 

73.0 

Dropout status 

Ever dropped out 

1,006 

859 

85.4 

652 

64.8 

75.9 

Not ever dropped out 

1,934 

1,674 

86.6 

1,197 

61.9 

71.5 

Second follow-up response 
status 

Respondent 

2,076 

1,848 

89.0 

1,415 

68.2 

76.6 

Nonrespondent 

864 

685 

79.3 

434 

50.2 

63.4 


NOTE: Count includes sample members who completed enough of the interview to count as a respondent. Sample members were 
still able to complete the web interview or call the help desk to complete a telephone interview once a case was sent to the field. For 
“ever dropped out,” classified as dropout if at least one of the following conditions was met after the second follow-up: school 
reported that respondent had dropped out of school at any one of the enrollment status updates, respondent earned a General 
Educational Development credential, respondent did not receive a high school credential, or respondent’s high school transcript 
indicates dropped out, dismissed, or incarcerated. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 

4.3 Responsive Design Methodology 

4.3.1 Responsive Design for ELS:2002 Third Follow-up Nonresponse Follow-up 

NCES and RTI are researching new strategies for conducting more effective nonresponse 
follow-up during the data collection period. There is a general recognition in the survey literature 
that nonresponse follow-up should be a strategic activity that prioritizes cases with the goal of 
minimizing bias in the final survey estimates (see for example, Peytchev et al. 2010; Rosen et al. 
2011; Wagner 2012). Furthermore, there is strong evidence that the overall survey response rate 
is an inadequate measure of data quality (e.g., Curtin et al. 2000; Groves and Peytcheva 2008; 
Keeter et al. 2000). The greatest danger in nonresponse follow-up may be when a study brings in 
sample members who resemble those most likely to respond or those who have already 
responded (Schouten, Cobben, and Bethlehem 2009). Under this scenario, resources are spent on 
increasing participation, but little is done to minimize bias. Decreasing bias during the 
nonresponse follow-up depends on the cases that are ultimately interviewed (Peytchev, Baxter, 
and Carley-Baxter 2009). The critical factor is that nonresponding cases selected for targeting 
should be substantively different from the respondent set at any one point during data collection. 
The question remains how to go about selecting those cases. 

In the ELS:2002 third follow-up, a responsive design (Groves and Heeringa 2006) was 
implemented in an attempt to minimize nonresponse bias. The following sections describe the 
implementation approach. 
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4.3.2 Responsive Design Approach 

The goal of the responsive design is to identify and target, via specific protocols or 
interventions, the nonresponding cases that are different from the respondent set at any one point. 
Although numerous approaches are available to identify cases (i.e., critical subgroups, propensity 
to respond), the ELS:2002 third follow-up used a Mahalanobis distance function to identify 
nonrespondent cases most unlike the existing respondent set. A large number of survey variables, 
paradata, and sampling frame variables were incorporated into the distance function calculation, 
providing an opportunity to target the cases most unlike respondents and therefore, if completed, 
most likely to reduce nonresponse bias. 

Following Li and Valliant (2009), the Mahalanobis Distance (MD) may be defined as: 


Wi 

where hu is the leverage (hat diagonal) for the i th case, w, is the sample weight for the i th case, n is 
the number of cases or observations, and w is the average sample weight. The hat diagonals are 
the diagonal elements of the hat matrix (H): 

h = x(x T wxy 1 x T w 

where W is a diagonal matrix of sample weights and X is a matrix of variables that define the 
dimensions along which distances are calculated. 

In the context of its use in the ELS:2002 third follow-up, Mahalanobis distance is defined 
as the distance between a nonresponding case and the weighted mean value of the complete set 
of responding cases. Therefore, cases with larger distance scores can be thought of as cases 
demonstrating large differences from the respondent set. That is, these large distance cases 
would be characterized by larger differences in the input variables from the weighted means of 
the variables for respondents. Identifying these cases and presenting the specifically targeted 
nonresponding cases with a higher incentive will in turn attempt to boost their participation and 
potentially reduce bias in estimates and also improve analytic power through higher sample sizes 
for these groups of cases of analytic interest. 

A distance function was calculated at three points during data collection: 

1 . right before outbound CATI began, 4 weeks into data collection; 

2. right before the CAPI period began, 9 weeks into data collection; and 

3. just prior to the prepaid incentive period, approximately 8 weeks prior to the end of 
data collection. 

At these points, the cases with the largest distance scores were offered a $55 incentive 
while the $25 base incentive remained intact for all other cases (not including ever dropout cases, 
which were offered $55 to complete the survey from the start of data collection). At each 
juncture, the cases identified for targeting were those with the largest distance scores but not 
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targeted in the prior phase(s). At the third and final case selection point, cases identified for 
targeting received a $5 prepaid incentive in addition to the $55 incentive and other nonmonetary 
activities, such as enhanced tracing. Case targeting was based on distance scores and the 
anticipated yield. At the first intervention point, 1,169 cases were targeted; at the second point, 
2,390 cases were targeted; and at the third point, 1,721 cases were targeted. At each of these 
points, the Mahalanobis values were calculated and targeted cases were selected. 

4.3.3 Mahalanobis Model Specification 

Variable Selection. Choosing variables to include in the distance function calculation is 
an important process. The goal is to identify variables that are (1) known for respondents and 
nonrespondents, and (2) important analytically so that bias in the final survey estimates would be 
problematic. Table 14 shows the variables used in calculating the Mahalanobis distance. 

Table 14. Variables used in the Mahalanobis calculation in the third follow-up: 2012 


Frame variables Survey variables Paradata 1 


School control 

• 

Parent’s highest level of education 

• 

F2 response status 

(public, Catholic, other 

• 

High school transcript-reported cumulative GPA 

• 

FI nonresponse type 

private) 

• 

Whether English is student’s native language 

• 

Ever responded to a panel 

School urbanicity 

• 

Ever earned GED/equivalency 


maintenance 

(urban, suburban, 

• 

Sex 

• 

F2 call count 

rural) 

• 

Socioeconomic status 

• 

Number of contact attempts 

• 

• 

Diploma or certificate most likely to receive 
Took or plans to take SAT or ACT 

• 

Response status for F3 panel 
maintenance 


• 

• 

• 

Took or plans to take Advanced Placement test 
FI enrollment status 

Highest level of education respondent expects 

• 

FI and BY combined 
response status 


to complete 

• FI sample member in-school grade-level status 
1 Paradata refers to data surrounding the survey interviewing process. 

NOTE: BY = base year. FI = first follow-up. F2 = second follow-up. F3 = third follow-up. GED = General Educational Development 
credential. GPA = grade point average. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Base Year to Third Follow-up, and Common Core of Data/Private School Survey. 


Technical Issues With Mahalanobis Calculation. Multiple approaches are available for 
generating Mahalanobis distance scores. The initial approach for the phase 1 calculations was to 
take advantage of the close relationship between the leverage statistic and the Mahalanobis 
distance. Distance scores were generated for phase 1 by outputting the hat matrix from an 
ordinary least squares regression (unweighted), which forced into the regression all the variables 
of interest. In other words, no insignificant variables were dropped. From the hat diagonal value, 
the Mahalanobis distance was generated for each case. Upon further examination of this 
approach, it was apparent that the method used for generating scores compared nonrespondents 
to the full sample. Because the approach seeks to identify nonresponding cases that differ from 
the existing respondent set, the phase 1 approach was modified for use in phase 2. 
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For the phase 2 distance calculations, the approach was adjusted to ensure a comparison 
between nonrespondents and the existing set of respondents. For this, a Stata program called 
Mahascore (Kantor 2006) was used. Before implementation, it was confirmed that the 
Mahascore program was calculating Mahalanobis scores in a theoretically justifiable manner by 
detennining that, in test cases, the produced scores matched the scores defined in Li and Valliant 
(2009). The Li and Valliant method for calculating Mahalanobis scores that incorporate sampling 
weights was implemented in the R programming language and applied to sets of test data. The 
Mahascore program was also applied to the test data and the resulting Mahalanobis scores were 
compared to these scores produced using the Li and Valliant method. The Mahalanobis scores 
produced using Mahascore matched the scores produced using the Li and Valliant method. 
Several other technical issues arose. Using the Stata Mahascore program, categorical variables 
have to be represented as binary variables. Some input variables that had all derived binary 
variables included in the Mahascore program (no reference or dropped category) did not end up 
contributing to the calculation of the Mahalanobis scores. In sum, these variables were not 
accounted for in the final distance score. It was then necessary to analyze the selected cases 
when binary variables were handled appropriately. A post-case selection analysis was perfonned 
and it was detennined that 95 percent of the selected cases would have been selected if the 
binary variables had been properly handled in Mahascore. The second issue involved small cell 
sizes in the respondent set of cases. Small cell sizes among respondents resulted in the inability 
to invert the variance-covariance matrix calculated from just respondent data and therefore the 
final Mahalanobis score could not include those variables. Refer to section 4.4 for response rates 
by responsive design group and chapter 6 for a discussion of the results of the responsive design 
in relation to bias scores. 

4.4 Base-year to Third Follow-up Response Rates 

Response rates for ELS:2002 are calculated by dividing the number of sample units who 
completed a particular study component by the number of sample units eligible for participation 
that are fielded. Sample members are not eligible if they are classified as deceased, sampling 
errors, or temporarily out of scope (unavailable for duration of study, out of the country, 
ineligible, incarcerated, or institutionalized). Eligible (in-scope) cases who were not contacted 
for participation (i.e., unfielded cases) are not counted in the response rate. All weighted response 
rates are calculated using the base weight appropriate for a given survey. 15 For each round of data 
collection, nonresponse bias analyses were performed to ensure that any identified biases 
resulting from nonresponse were small or were adjusted for, and that the data could be used with 
confidence. Response rate data for ELS:2002 are summarized in table 15. 


15 


For example, the third follow-up used the first follow-up design weight adjusted for unknown eligibility and scope. 
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Table 15. Summary of ELS:2002 base-year to third follow-up school and student response rates: 
2002-2012 


Survey 

Eligible 

Participated 

Weighted percent 

Unweighted percent 

Base-year school sample 

1,221 

752 

67.8 

61.6 

Base-year student questionnaire 

17,591 

15,362 

87.3 

87.3 

First follow-up questionnaire 

16,515 

14,989 

88.7 

90.8 

First follow-up high school transcripts 

16,373 

14,920 

90.7 

91.1 

Second follow-up questionnaire 

15,892 

14,159 

88.4 

89.1 

Third follow-up questionnaire 

15,724 

13,250 

83.8 

84.3 

2002 sophomore cohort in 2012 

15,568 

13,133 

83.9 

84.4 

2004 senior cohort in 2012 

13,635 

11,652 

84.8 

85.5 


NOTE: Base-year student questionnaire and first follow-up questionnaire rates based on public release file. Second follow-up 
questionnaire and third follow-up questionnaire rates based on fielded in-scope cases. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Base Year to Third Follow-up. 

Base-year school and student questionnaire response rates. Of 1,221 eligible contacted 
schools, 752 participated in the survey for an overall weighted school participation rate of 68 
percent. These schools are nationally representative of public and private schools. Of 17,591 
selected eligible students, 15,362 participated, for a weighted student response rate of 
approximately 87 percent. A nonresponse bias analysis was performed and it confirmed that any 
identified biases resulting from nonresponse were small and that the data could be used with 
confidence. 

First follow-up student questionnaire response rates. In the first follow-up, there were 
16,515 eligible sample members, which included 15,362 base-year participants and 1,153 
retained base-year nonparticipants. This subsample of nonrespondents was pursued to further 
improve the representativeness and reduce the bias of the base-year sophomore cohort 
participating sample. A total of 14,989 sample members responded to the questionnaire for a first 
follow-up response rate of 89 percent. First follow-up weighted response rates are reported at the 
student level only (the school sample was not strictly representative of the nation’s high schools 
that had a 12th grade in 2003-04). 

High school transcript response rates. A total of about 1,500 base-year and transfer 
schools provided at least one transcript for sample members. Ninety-one percent (weighted) of 
the student sample have some transcript infonnation (14,920 of 16,373). 

Second follow-up response rates. The second follow-up consisted of 16,352 fielded 
sample members. Of these, 15,892 sample members were found to be eligible and 14,159 
responded to the second follow-up interview, for a weighted response rate of 88 percent overall. 

Third follow-up response rates. The third follow-up sample consisted of 16,352 sample 
members fielded for the second follow-up but excluded 176 individuals who were not fielded or 
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out-of-scope 16 for the third follow-up. Thus, 16,176 sample members were fielded for the third 
follow-up. During the third follow-up data collection, 452 sample members were detennined to 
be ineligible, resulting in 15,724 eligible sample members. A total of 13,250 sample members 
responded to the interview, for a weighted response rate of 84 percent overall. For the second 
and third follow-ups (but not the base year or first follow-up), the response rate is a conditional 
one, based on the cases that were fielded. 17 Weighted response rates were calculated using the 
design weight (i.e., the third follow-up used the first follow-up design weight adjusted for 
unknown eligibility and scope). 18 The third follow-up weighted response rate, therefore, 
represents the proportion of the combined 10th- and 12th-grade population that was in-scope for 
the third follow-up, was fielded, and that responded. Response rates for the third follow-up, by 
select characteristics and interview mode, are shown in table 16. 

Table 16 also contains weighted and unweighted response rates for the three responsive 
design groups. Seventy-three percent (weighted) of cases targeted for treatment completed the 
ELS:2002 third follow-up interview. The group targeted at the start of CATI yielded 882 
completed interviews for a 77 percent weighted response rate. The group targeted at the start of 
CAPI yielded 1,641 completed interviews for a 70 percent weighted response rate. The third 
group, targeted at the start of the prepaid incentive mailings, yielded 1,257 completed interviews 
for a 73 percent weighted response rate. Refer to chapter 6 for a discussion of the results of the 
responsive design in relation to bias scores. 


16 Approximately one-third of the individuals were deceased, one-third were questionnaire-incapable or otherwise incapacitated, 
and one-third were a mix of study refusals, individuals who had not responded in the prior two rounds, and those who were 
institutionalized or incarcerated. 

17 

An unconditional response rate would include cases that were not fielded in the third follow-up: double (base-year + first 
follow-up) nonrespondents, senior freshening sample nonrespondents, and sample members who withdrew from the study. 

The unconditional weighted response rate (overall weighted response rate) was 78.2 percent. Permanently out-of-scope cases do 
not count in the response rate calculation although their numbers have been documented. 

18 Weighted response rates using the base weight appropriate for the survey are presented because of the importance of 
population estimation and because NCES survey response standards are based on weighted completions. This chapter’s 
methodological tables in section 4.2 show unweighted proportions, because of their different focus. 
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Table 16. Third follow-up response rates, by select characteristics and mode: 2012 


Subgroup 

Total eligible 

Respondents 

Weighted 

percent 

Unweighted 

percent 

Total 

15,724 

13,250 

83.8 

84.3 

Sex 

Male 

7,744 

6,271 

80.4 

81.0 

Female 

7,980 

6,979 

87.1 

87.5 

Race/ethnicity 

Asian or Pacific Islander 

1,582 

1,310 

83.8 

82.8 

Black or African American 

2,104 

1,702 

81.1 

80.9 

Hispanic or Latino 

2,352 

1,885 

78.7 

80.1 

Other/more than one race 

881 

726 

79.6 

82.4 

White 

8,805 

7,627 

86.2 

86.6 

Socioeconomic status (SES) 

Lowest quarter 

3,756 

2,976 

78.4 

79.2 

Second quarter 

3,723 

3,088 

82.5 

82.9 

Third quarter 

3,884 

3,278 

84.4 

84.4 

Highest quarter 

4,361 

3,908 

90.0 

89.6 

Second follow-up response status 

Second follow-up respondent 

13,837 

12,135 

87.3 

87.7 

Second follow-up nonrespondent 

1,887 

1,115 

60.8 

59.1 

Dropout status 

Ever dropped out 

1,762 

1,384 

79.0 

78.5 

Not ever dropped out 

13,962 

11,866 

84.5 

85.0 

Responsive design group 

All groups 

5,196 

3,780 

72.6 

72.7 

Targeted at CATI start 

1,146 

882 

76.9 

77.0 

Targeted at CAPI start 

2,320 

1,641 

70.1 

70.7 

Targeted at pre-pay start 

1,730 

1,257 

72.9 

72.7 

Interview mode 

Web 

15,724 

9,624 

60.0 

61.2 

CATI 

11,469 

2,573 

22.7 

22.4 

CAPI 

2,940 

1,053 

37.0 

35.8 


NOTE: Numbers may not sum to total because of rounding. The weighted percent uses the first follow-up design weight adjusted for 
unknown eligibility and scope. Respondent count includes sample members who completed enough of the interview to count as a 
respondent. The “Other/more than one race” subgroup includes American Indian/Alaska Native cases. SES status applies to the 
parent of the respondent. Interview mode total eligible counts may overlap because a respondent could have been eligible for 
multiple modes. For total eligible CATI mode, 36 cases completed via CATI during the pre-CATI phase. For “ever dropped out,” 
classified as dropout if at least one of the following conditions was met after the second follow-up: school reported that respondent 
had dropped out of school at any one of the enrollment status updates, respondent earned a General Educational Development 
credential, respondent did not receive a high school credential, or respondent’s high school transcript indicates dropped out, 
dismissed, or incarcerated. CAPI = computer-assisted personal interview. CATI = computer-assisted telephone interview. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 
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Chapter 5. 

Data Preparation and Processing 

This chapter documents the automated systems, data processing, cleaning, and editing 
activities of the Education Longitudinal Study of 2002 (ELS:2002) third follow-up. 

5.1 Overview of Systems Design, Development, and Testing 

Many of the systems and processes used in the ELS:2002 third follow-up were designed 
during the first follow-up field test with improvements implemented for the main study and later 
for the second follow-up. The following systems were developed and used for the first follow-up 
and employed and improved thereafter: 

• Integrated Management System (IMS) — a comprehensive tool used to exchange files 
between RTI and the National Center for Education Statistics (NCES), post daily 
production reports, and provide access to a centralized repository of project data and 
documents; 

• Survey Control System (SCS) — the central repository of the status of each activity for 
each case in the study; 

• Hatteras Survey Engine and Survey Editor — a web-based application used to develop and 
administer the ELS:2002 instrument; 

• Computer-assisted telephone interview (CATI) Case Management System (CMS) — a call 
scheduler and case delivery tracking system for telephone interviews; 

• Integrated Field Management System (IFMS) — a field reporting system to help field 
supervisors track the status of in-school data collection and field interviewing; 

• ELS:2002 survey website — public website hosted at NCES and used to disseminate 
information, collect sample data, and administer the survey; and 

• Data-cleaning programs — SAS programs developed to apply reserve code values where 
data are missing, clean up inconsistencies (because of respondents backing up), and fill 
data where answers are known from previously answered items. 

System development included the following phases: planning, design, development, 
testing, and execution and monitoring. Specifications were developed in word processing 
documents and flowchart applications. Specifications were updated to reflect what changed 
between the field test questionnaires and the full-scale questionnaires. 

Each system implements safeguards to handle personally identifying information (PII) as 
applicable. Systems such as the IMS, Hatteras, and the SCS are standard RTI systems used 
successfully in the ELS:2002 third follow-up field test and were developed using the latest 
software tools such as Microsoft .NET and Microsoft SQL Server database. 
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Processing of PII by all systems was developed in accordance with the Federal 
Infonnation Processing Standards (FIPS) moderate security standard. Movement of the data 
containing PII was handled appropriately, meeting the security requirements between the 
locations. Data when moved between locations were encrypted, which met the FIPS 140.2 
standards, and were decrypted once they successfully reached the destination. The automated 
systems were developed to handle the need of moving data and files between locations in an 
efficient and secure manner. 

5.2 Data Processing and File Preparation 

Item documentation procedures were developed to capture variable and value labels for 
each item. Item wording for each question was also provided as part of the documentation. This 
information was loaded into a documentation database that could export final data file layouts 
and fonnat statements used to produce formatted frequencies for review. The documentation 
database also had tools to produce final electronic codebook (ECB) input files. 

5.3 Data Cleaning and Editing 

Questionnaire data were stored in an SQL database that was consistent across data 
collection modes for a particular questionnaire. The instrument used to administer the web 
survey was the same instrument as the CATI instrument and CAPI instrument, and the 
questionnaire data were stored in the same SQL database. This ensured that skip patterns were 
consistent across applications. 

Editing programs were developed to identify inconsistent items across logical patterns 
within the questionnaire. These items were reviewed, and rules were written to either correct 
previously answered (or unanswered) questions to match the dependent items or blank out 
subsequent items to stay consistent with previously answered items. 

Programs were also developed to review consistencies across multiple sources of data 
and identify discrepancies that required further review and resolution. Consistency checks 
included unlikely patterns across rounds (e.g., inconsistencies between second follow-up and 
third follow-up reports) as well as across sources within a given round. 

5.3.1 Cleaning and Editing Methods 

Variables drawn directly from third follow-up questionnaire items (i.e., variables 
following the naming convention F3A##, F3B##, F3C##, or F3D##) were edited in three ways: 
(1) they were edited via the application of reserve codes (section 5.3.2); (2) they were edited by 
carrying forward known infonnation from previously administered items/variables to 
downstream items/variables which were legitimately skipped during survey administration 
(section 5.3.3); and (3) they were edited to address inconsistent responses (section 5.3.4). 
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5.3.2 Application of Reserve Codes 

Reserve codes employed by third follow-up questionnaire variables follow conventions 
established in prior rounds of ELS:2002 data: that is, values of -3 indicate “item legitimate skip/ 
item not applicable”; values of -4 indicate “nonrespondenf values of -5 indicate “suppressed” 19 ; 
values of -7 indicate “not administered — abbreviated interview or break-off’ 20 ; values of -8 
indicate “survey component legitimate skip”; and values of -9 indicate “missing.” For more 
infonnation see chapter 7, section 7.3.1. Editing programs were developed to apply reserve codes 
as follows: (1) values of -4 and -8 were globally applied across all third follow-up questionnaire 
variables based on nonparticipation status (-4s for third follow-up nonrespondents, and -8s for 
third follow-up out-of-scope sample members); (2) values of -3 were conditionally applied to 
reflect the routing logic of the third follow-up survey instrument; and (3) values of -7 were 
conditionally applied based on the type of questionnaire to which the sample member responded 
(full-length or abbreviated), and whether the variable in question was included in both the full- 
length and abbreviated questionnaire, or whether it was included in the full-length questionnaire 
only. After application of reserve codes, cross-tabulations for each third follow-up questionnaire 
variable were produced and reviewed, and editing programs were subsequently revised as 
necessary. This process continued iteratively until reserve codes were properly applied across all 
third follow-up variables. 

5.3.3 Filling Forward Known Information 

In addition to the application of reserve codes, editing programs were also developed to 
fill forward known information into items/variables which were legitimately skipped during 
survey administration. For example, quite a number of third follow-up respondents were not 
administered the question (corresponding to variable F3B09), “Since January 2006, have you 
ever held a job for pay?” This was primarily because the majority of respondents had already 
indicated during initial survey questions about their current activities (i.e., F3A01A — F3A01H) 
that they were currently working for pay, thereby making administration of F3B09 unnecessary. 
Editing programs set the variable F3B09 to “yes” for all respondents who indicated that their 
current activities included some form of paid employment. 

Instances of this filling forward of known infonnation can be identified by using the 
“administered to” and “applies to” information included in the ECB, Education Data Analysis 


19 

Values of -5 were used to indicate the suppression of values for third follow-up restricted-use variables appearing on public- 
use data products. For those variables, the label for -5 will read “Suppressed.” It should be noted that in prior rounds, values of -5 

were used to identify out-of-range responses, so the label for -5 in those instances will read “Out of Range.” 

20 

Values of -7 are labeled as “not administered — abbreviated interview or partial interview break-off’ for all variables across all 
study rounds. However, data users should note that for second follow-up variables, -7 indicates a missing value resulting from 
partial interview break-off; meanwhile, for third follow-up variables, -7 is applied — in cases where the sample member 
responded to the third follow-up abbreviated questionnaire — for items/variables that were included in the full-length 
questionnaire only. 
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Tool (EDAT), and PowerStats data analysis tool documentation for each third follow-up 
questionnaire variable (i.e., variables following the naming convention F3A##, F3B##, F3C##, 
or F3D##). Specifically, the “administered to” infonnation describes the set of respondents who 
actually saw/heard the associated questionnaire item, while the “applies to” infonnation 
describes the set of respondents who have nonmissing data for the particular variable; in 
instances where the “applies to” text describes a superset of the respondents described in the 
“administered to” text, infonnation has been filled forward. See figure 6 (which shows the 
documentation window for variable F3B09) as an example of the distinction between 
“administered to” text and “applies to” text. 

Figure 6. Documentation window for variable F3B09 

Variable Name: F3B09 

Record #1, Position: 8511-8512, Format: N2. 

Variable Label: Held a job for pay since January 2006 
Variable Description: 

- Since January 2006, have you ever held a job for pay? 

1=Yes 

0=No 

NOTE: Administered to ELS:2002 third follow-up respondents who had not already provided evidence of working for pay or serving 
in the military (i.e., respondents where F3A01A* 1, F3A01B*1, F3A01H + 1, F3A03A * 1, F3A03B * 1, F3A03H 1 1, and F3B01 + 
1). Applies to all ELS:2002 third follow-up respondents (F3B09 is set to 1 for cases where F3A01A = 1, F3A01B = 1, F3A01H = 1, 
F3A03A = 1, F3A03B = 1, F3A03H = 1, or F3B01 = 1). 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 

5.3.4 Addressing Inconsistent Responses 

Editing programs were developed to output inconsistent items across logical patterns 
within the questionnaire. These items were reviewed, and in some cases, edits were made to 
either correct previously answered (or unanswered) questions to match the dependent items or 
blank out subsequent items to stay consistent with previously answered items. Examples of 
inconsistent responses which were edited include small numbers of respondents who provided a 
last-employed date which preceded the date they began their most recent job; in these instances, 
one or both dates were edited based on other information provided by the respondent (e.g., the 
number of weeks they reported being employed during 2010, 2009, or 2008). Other examples of 
inconsistent responses which were edited include mismatches between the date of first 
postsecondary attendance and the date of last postsecondary attendance; mismatches between the 
date a postsecondary credential was earned and the date the credential-awarding institution was 
first attended; mismatches between the type of postsecondary credential earned and the 
institutional level of the credential-awarding institution; and mismatches between the number of 
hours per week respondents reported working at their current job, and whether they identified 
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that job as a full-time job (35 hours/week or more) or a part-time job (fewer than 35 hours/week) 
when reporting their current activities. 

5.4 Coding 

The survey instrument collected data on major fields of study, occupations, and 
postsecondary institutions, all of which required coding. The survey instrument included 
applications which allowed respondents or interviewers to code text strings to widely used 
taxonomies. All text strings that were not coded during the interview were coded as part of data 
processing. This section describes the types of data requiring coding, the coding applications, the 
coding process, quality control procedures, and measures of coding quality. 

5.4.1 Major Field of Study Coding 

Respondents identified the primary and secondary (when applicable) fields of study for 
each credential reported to have been earned. The instruments included a coding application that 
allowed major coding using the NCES 2010 Classification of Instructional Programs (CIP) 21 
taxonomy. On the restricted-use third follow-up student-institution data file, researchers will find 
both a 2-digit version and a 6-digit version of the CIP code for fields of study. 

• F3ICREDGEN1 — Credential #1 (highest or only credential from the given institution): 
field-of-study 2-digit (general) code 

• F3ICREDSPE1 — Credential #1 : field-of-study 6-digit (specific) code 

• F3ICREDGEN2_1 — Credential #1: second major field-of-study 2-digit (general) code 

• F3ICREDSPE21 — Credential #1 : second major field-of-study 6-digit (specific) code 

• F3ICREDGEN_2 — Credential #2 (additional credential from the given institution): field- 
of-study 2-digit (general) code 

• F3ICREDSPE_2 — Credential #2: field-of-study 6-digit (specific) code 

• F3ICREDGEN2_2 — Credential #2: second major field-of-study 2-digit (general) code 

• F3ICREDSPE2_2 — Credential #2: second major field-of-study 6-digit (specific) code 
Only the 2-digit versions of these variables appear on the public-use data file. 

5.4.1. 1 Major Field of Study Coding Methods 

To use the coding application, respondents or interviewers first entered text to describe 
the field of study. Then, a list of majors, customized based on the text string, was presented. The 
respondent or interviewer could choose one of the options listed, or choose “none of the above.” 
If “none of the above” was selected, a two-tiered dropdown menu appeared. The first dropdown 
menu contained a general list of majors; the second was more specific and was dependent on the 


21 

http://nccs.cd.gov/ipcds/cipcodc/ 
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first. Interviewers were trained to use probing techniques to assist in the coding process via the 
instrument-provided coding tool. Self-administered web respondents were provided supporting 
text on-screen. If the respondent or interviewer was unable to find a good match, he or she could 
proceed with the interview without selecting a code. In this case, the text string and any 
selections from the dropdown menus were retained. 

All major text strings that were not coded during the interview were processed by RTI. 
First, the major text strings that appeared more than once were assigned a code by an expert 
coder. This code was then applied to all other exact -matching text strings to ensure consistency 
of codes for duplicate text strings. The remainder of the text strings was “upcoded” to the CIP 
taxonomy by coding experts using an application that used the same search function as the 
application in the instruments. The coding expert could assign a CIP field of study code or assign 
a value of 999999 to indicate that the text string was too vague to code. 

5.4.1. 2 Major Field of Study Coding Quality Control Procedures and Results 

To evaluate the quality of the coding completed during the interview, a random sample of 
approximately 10 percent of the pairs of verbatim strings and codes was selected for recoding 
and analysis. RTI coding personnel evaluated text strings and assigned codes without knowledge 
of the codes that were selected during the interview. If the code selected differed from the code 
assigned during the interview, the coding expert was then shown both codes. The coding expert 
was instructed to only override the code selected during the interview if it was clearly incorrect. 
When a code was overridden, the new code was included on the data file in place of the original 
code. Text strings were designated “too vague to code” when they lacked sufficient clarity or 
specificity. 

Results of recoding of strings coded during the interview are given in table 17. Nearly all 
of the codes selected during the interview were deemed to be accurate to the most detailed 6- 
digit level (97 percent). The coding expert disagreed with the CIP code selected during the 
interview for only 3 percent of the strings. In these instances, the code selected by the expert 
coder replaced the code selected during the interview. 

Table 17. Match/disagree results of quality control recoding and upcoding of major: 2012 


Results for sample of strings coded during interview 

Percent 

Match at 6-digit and 2-digit 

97.0 

Match at 2-digit, but not 6-digit 

0.1 

Disagree 

2.9 


NOTE: Detail may not sum to totals because of rounding. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


5.4.2 Occupation Coding 

The ELS:2002 third follow-up data collection instrument included occupation coding 
tools that facilitated coding of literal responses of occupation job title and duties to the 2010 
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Standard Occupational Classification 22 (SOC) taxonomy. Occupation job title and duties were 
matched to occupation descriptions from the Occupational Infonnation Network (0*NET). 23 
Interview respondents were asked to identify their current occupation and the occupation they 
anticipated having at age 30. For technical information on these variables, see appendix E. 

On the restricted-use data file, researchers will find both a 2-digit version and a 6-digit 
version of the 0*NET code for respondents’ occupations: 

• F30NET2CURR — 2-digit ONET code for current/most recent job 

• F30NET6CURR — 6-digit ONET code for current/most recent job 

• F3ONET2AGE30 — 2-digit ONET code for expected age-30 occupation 

• F3ONET6AGE30 — 6-digit ONET code for expected age-30 occupation 
Only the 2-digit versions of these variables appear on the public-use data file. 

5.4.2.1 Occupation Coding Methods 

Respondents were asked to provide a job title and job duties for each occupation. All 
respondents completing the English version of the web interview were also asked to code the 
occupation to the SOC taxonomy. To code the occupation, interviewers or respondents who were 
self-administering the web questionnaire first entered the job title and duties. These strings were 
automatically matched to the occupation descriptions from 0*NET and a customized list of 
occupations was presented. Interviewers or respondents could choose one of the options listed, or 
choose “none of the above.” In the occupation coding application, selecting “none of the above” 
presented the coder with a set of three sequential dropdown menus, each with choices increasing 
in their level of specificity. The first dropdown menu contained a general list of occupations. The 
options presented in the second dropdown were dependent on the code selected in the first. Some 
selections from the second dropdown required coders to make a selection from a third even more 
detailed dropdown menu. Interviewers were trained to use probing techniques to assist in the 
coding process via the instrument-provided coding tool. Self-administered web respondents were 
provided supporting text on screen. If the respondent or interviewer was unable to find an 
appropriate SOC code for the occupation, they could proceed with the interview without 
selecting a code. In this case, the text string and any selections from the dropdown menus were 
retained to assist with coding during data processing. 

RTFs coding experts attempted to code all occupations that were not coded in the web 
interview. This “upcoding” was completed using an application that used the same search 
function as the application in the instrument. The coding expert could assign an 0*NET code or 
assign a value of 999999 to indicate that the text string was too vague to code. 


22 

http://www.bls.gov/soc/200Q/socstmc.pdf 

23 

“ http://www.onetcenter.org/overview.html 
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5A.2.2 Occupation Coding Quality Control Procedures and Results 

Coding experts evaluated the quality of coding that was completed during the interview 
by recoding a random sample of approximately 10 percent of the occupation text strings. To 
recode the selected occupations, RTI staff members worked with a coding application which 
used the same search function as the application in the instruments. These coding experts 
evaluated text strings and assigned codes without knowledge of the codes that were selected 
during the interview. If the code selected differed from the code assigned during the interview, 
the coding expert was then shown both codes. The coding expert was instructed to only override 
the code selected during the interview if it was clearly incorrect. When the expert coder did not 
agree with the 6-digit code selected during the interview, the new code was included on the data 
fde in place of the original code. Text strings were designated “too vague to code” when they 
lacked sufficient clarity or specificity. 

The expert coders agreed with the 6-digit code selected during the interview for 95 
percent of the text strings reviewed and agreed with the 2-digit code (but not the 6-digit code) for 
less than 1 percent of the text strings reviewed, for a total of approximately 96 percent agreement 
at the 2-digit level. The expert coder disagreed with the code selected during the interview for 
approximately 4 percent of the occupations. The results are shown in table 18. 


Table 18. Match/disagree results of quality control recoding and upcoding of occupation: 2012 


Results for sample of strings coded during interview 

Percent 

Match at 6-digit and 2-digit 

94.7 

Match at 2-digit, but not 6-digit 

0.9 

Disagree 

4.5 


NOTE: Detail may not sum to totals because of rounding. 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


5.4.3 Postsecondary Institution (Integrated Postsecondary Education Data 

System [IPEDS]) Coding 

Respondents were asked to identify each postsecondary institution attended since leaving 
high school. The postsecondary institution IDs are available only on the restricted-use third 
follow-up student-institution data file: 

• F3IIPED — IPEDS code of attended postsecondary institution 

5.4.3.1 Postsecondary Institution Coding Methods and Results 

Respondents were asked to identify the postsecondary institutions attended using a look- 
up tool. After respondents (or the interviewer) entered the institution’s name, city, and state into 
the web survey, they could search an instrument-provided look-up tool containing institutions 
from the 2010-11 IPEDS for the appropriate match. When a match was not found, the 
respondent was asked to provide the institution’s level (i.e., 4-year, 2-year, less-than-2-year) and 
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control (i.e., public, private not-for-profit, private-for-profit). This infonnation was later used to 
assist RTI staff in finding a match in IPEDS as part of data processing. 

Text strings not coded by respondents through the instrument-provided look-up tool were 
provided to coding experts to be upcoded in the following manner. First, cases were compared 
against the 2002-1 1 IPEDS database for matching. Any case with school name, city, and state 
that exactly matched an IPEDS record was assigned the corresponding IPEDS ID. Then, any text 
strings that remained uncoded were loaded into the coding application for an RTI coding expert 
to assign IDs. As shown in table 19, 89 percent of all text strings were coded during the 
interview and 1 1 percent were coded by expert coders at RTI. 

Table 19. Postsecondary institution names by IPEDS coding method: 2012 


Results 

Number 

Percent 

Total 

9,424 

100.0 

Coded during interview 

8.373 

88.8 

Coded by expert coders 

1,044 

11.1 


NOTE: Detail may not sum to totals because of rounding. IPEDS = Integrated Postsecondary Education Data System. 
SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 


After all coding was complete, 95 percent of the institutions had been assigned an IPEDS 
ID. Less than 1 percent of institutions were foreign, and 4 percent were uncodeable, which 
usually meant the string was not a postsecondary institution. 

5.5 Construction of Scales 

Certain sets of items that appear in the survey were designed to be analyzed as social- 
cognitive career theory-based scales. Scales need to be unidimensional in terms of their factor 
structure, need to be reliable, and need to increase the reliability above that which could be 
obtained through use of a single item for measurement purposes. The survey includes, for 
example, questions related to job satisfaction and the nature of the individuals the respondent 
works with. 

Prior to constructing the scales, questionnaire responses were subjected to data cleaning 
procedures discussed previously; in addition, as an interim step in scale construction, some 
questionnaire items were temporarily reverse coded (that is, positively and negatively worded 
items were coded to reflect the same direction on the construct) to equate larger scale values with 
positive attributes (e.g., higher levels of job satisfaction). Once the data were finalized, the 
(weighted) reliability of the scale items was evaluated using Cronbach’s alpha. Weighted scales 
were then created if the associated items had a minimum alpha level between 0.8 and 0.9 
(depending on the variable) with SAS® Proc Factor and standardized to have mean zero and 
(weighted) standard deviation of one. Scales were set to missing if any of the scale items were 
missing. The individual item-level data are also available on the data file. Researchers are 
encouraged to further examine the psychometric properties of the scales using the item-level 
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data. The scales presented on the data file are just one way to combine the information. 
Individual researchers are able to recreate these or similar scales. 

Three scales were created at the individual respondent level. These included a work 
support index/scale score (F3JOBSUPP); job satisfaction index/scale score (F3JOBSATIS); and 
an index/scale score for job persistence intentions (F3JOBPERSIST). The aforementioned scales 
are included in table 20. The scales were first developed in the third follow-up field test. More 
infonnation about these scales and scale and construct development may be found in the 
ELS:2002 Third Follow-up Field Test Report (Ingels et al. 2012, NCES 2012-03). 

Table 20. Summary information for scales: 2012 

Student scale Variable name Cronbach’s alpha 

F3JOBSUPP: Work Support 

F3B34A people supportive 0.80 

F3B34B learn from them 
F3B34C people helpful 

F3JOBSATIS: Job Satisfaction F3B34D well satisfied 0.90 

F3B34E enthusiastic 
F3B34F enjoy work 

F3JOBPERSIST: Job Persistence Intentions F3B34G plans to stay 0.89 

F3B34H not leaving 
F3B34I committed to job 

SOURCE: U.S. Department of Education, National Center for Education Statistics. Education Longitudinal Study of 2002 
(ELS:2002) Third Follow-up. 
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Weighting, Imputation, Bias Assessment, 
and Design Effects 

6.1 Overview of Weighting, Imputation, Bias Assessment, and 
Design Effects 

Implicitly building on the sample design discussion in chapter 3, chapter 6 describes the 
study populations of the Education Longitudinal Study of 2002 (ELS:2002) and describes the 
statistical activities designed to support analysis of the study populations. In particular, the 
weighting, imputation, and bias assessment procedures and the design effects for the third 
follow-up are described in detail. Information about these activities in prior rounds may be found 
in the corresponding prior-round documentation: the base-year data file user’s manual (NCES 
2004-405), the first follow-up data file documentation (NCES 2006-344), and the base-year to 
second follow-up data file documentation (NCES 2008-347). 

The general purpose of ELS:2002 weighting was to compensate for unequal probabilities 
of selection and to adjust for the fact that not all individuals selected into the sample actually 
participated. Chapter 6 describes the weights developed for the third follow-up and documents 
the statistical properties of the weights. The third follow-up sample weights and the details of 
their construction are described in section 6.2. The design effect is a measure of sample 
efficiency. More specifically, the design effect is the ratio of the true variance of a statistic 
(taking the complex sample design into account) to the variance of that statistic for a simple 
random sample with the same number of cases. Design effects for the third follow-up are 
explored in section 6.3. Because no single design effect is universally applicable to any given 
survey or analysis, it also reports design effects for different subgroups and statistics. Imputation 
attempts to address the issue of item nonresponse by providing a procedure that uses available 
infonnation and some assumptions to derive substitute values for the missing values in a data 
file. The chapter provides further information on the key items that were subject to imputation, 
the imputation procedures, and the results of imputation. The imputation of third follow-up 
variables is explained in section 6.4. Methods used to address potential disclosure issues are 
summarized in section 6.5. Bias assessment measures the degree to which unit and item 
nonresponse introduce bias in estimates and the degree to which observed bias is reduced as a 
result of weighting adjustments or responsive design. Bias assessment is covered in two sections; 
unit and item nonresponse are discussed in section 6.6 and responsive design results are 
discussed in section 6.7. 
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6.2 Calculation of Third Follow-up Weights and Results of Weighting 

A variety of topics are discussed in the following subsections. Sections 6.2.1 and 6.2.2 
provide a high-level overview of the ELS:2002 target populations, potential domains of analysis 
for those populations, and a description of the analysis weights created for the third follow-up. 
The model-based approach for weight adjustment is discussed in section 6.2.3 and details of the 
weight adjustment factors used to create the third follow-up analysis weights are given in 
sections 6.2.4 and 6.2.5. A discussion of the Balanced Repeated Replicate (BRR) weights 
produced for the third follow-up occurs in section 6.2.6 and a brief discussion of quality control 
methods used to produce the third follow-up weights may be found in section 6.2.7. 

6.2.1 Target Populations and Analysis Domains 

The sample design for ELS:2002 supports a number of analyses, which in turn permit 
accurate inferences to be made to three major groups or target populations: (1) Population A: 
Spring 2002 high school sophomores; (2) Population B: Spring 2004 high school seniors; and 
(3) Population C: Spring 2002 lOth-grade schools. 

Figure 7 illustrates that whereas some students are in only population A or population B, 
many students are in both populations — that is, both a Spring 2002 sophomore and a Spring 2004 
senior. 


Figure 7. Student analysis populations, by year: 2004 



Population B; 
Spring 2004 
12 th -grade students 


SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), First Follow-up, 2004. 


Within these three target populations are a variety of important subpopulations. Although 
these subpopulations are subsets of the three target populations, the ELS:2002 sample design 
does not guarantee that the ELS:2002 sample will be representative of all subsets of the three 
primary target populations. The following lists give examples of subpopulations as subsets of the 
three target populations. 
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Population A: Spring 2002 lOth-grade students: 

• Subpopulations: 24 

- Spring 2002 lOth-grade students capable of completing the student questionnaire 

- Spring 2002 lOth-grade students in base-year school in Spring 2004 

- Spring 2002 lOth-grade students in a different school in Spring 2004 (transfers) 

- Spring 2002 lOth-grade students who were dropouts in Spring 2004 

- Spring 2002 lOth-grade students who graduated or achieved equivalency early 
(i.e., prior to March 15, 2004) 

- Spring 2002 lOth-grade students who graduated by August 31, 2004 

- Spring 2002 lOth-grade students who were home-schooled in Spring 2004 25 

- Spring 2002 White lOth-grade students 

- Spring 2002 Black lOth-grade students 

- Spring 2002 Hispanic lOth-grade students 

- Spring 2002 Asian lOth-grade students 

- Spring 2002 public school lOth-grade students 

- Spring 2002 private school lOth-grade students 
Population B: Spring 2004 12th-grade students: 

• Subpopulations: 

- Spring 2004 12th-grade students capable of completing the student questionnaire 

- Spring 2004 12th-grade students who were graduating high school seniors in 
Spring 2004 

- Spring 2004 12th-grade students who graduated by August 31, 2004 

- Spring 2004 White 12th-grade students 

- Spring 2004 Black 12th-grade students 

- Spring 2004 Hispanic 12th-grade students 

- Spring 2004 Asian 12th-grade students 


24 The subpopulations listed are important but are not the only possible subpopulations. 

25 

“ Although conceptually Spring 2002 sophomores who were homeschooled in 2004 may be thought of as an analysis population, 
they were not designed to be so and were therefore not subject to minimum sample size requirements. The group is of limited 
analytic utility owing both to the low sample size and to the narrowness of the population definition. The compelling practical 
reason for distinguishing this group was so that they could be administered only those items consonant with their unique situation 
as out-of-school students. 
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- Spring 2004 public school 12th-grade students 

- Spring 2004 private school 12th-grade students 

Population C: Spring 2002 lOth-grade schools: 

• Subpopulations: 

- School type: public, Catholic, and other private 

- Urbanicity: urban, suburban, and rural 

- Region: Northwest, Midwest, South, West 

ELS:2002 student sample members were interviewed as part of third follow-up activities. 
Sample members who completed a certain prespecified proportion 26 of the third follow-up 
questionnaire were considered to be third follow-up respondents. ELS:2002 third follow-up 
respondents may be in either population A (lOth-grade cohort), or population B (12th-grade 
cohort), or in both. To identify those respondents belonging to a particular target population, two 
flag variables are provided. The flag variable G10COHRT denotes membership in the Spring 
2002 lOth-grade population and the flag variable G12COHRT 27 denotes membership in the 
Spring 2004 12th-grade population. 

Analytic uses of these three populations, and the weighting required to support the 
analyses, are discussed in section 6.2.2. 

6.2.2 Overview of Third Follow-up Analysis Weights 

Six sets of weights were computed for the third follow-up. These weights and the types of 
analyses that may be conducted using these weights are described in more detail below in 
sections 6.2.2. 1 through 6. 2.2. 3. Although third follow-up student weights were created, no third 
follow-up school weights were created. Discussion of school weights in the context of the 
ELS:2002 third follow-up may be found in section 6.2. 2.4. 

The six sets of weights were conceptualized to support three basic kinds of analysis: 

(1) Estimates based on third follow-up data that include either students enrolled in 10th 
grade in the spring of 2002 (population A) or students enrolled in 12th grade in the 
spring of 2004 (population B). These weights are: F3QWT and F3QTSCWT. They 
are described in more detail in section 6.2.2. 1. 


26 In the base year and first follow-up of ELS:2002, sample members are considered respondents if they complete at least a 
certain proportion of the round-appropriate questionnaire or if they are “questionnaire incapable” for that round (although eligible 
for contextual data and transcripts). (Again, questionnaire-incapable students were those who could not be validly assessed or 
surveyed owing to severe disability or language barrier.) Sample members are considered respondents in the third follow-up if 
they complete at least a certain proportion of the third follow-up questionnaire. 

27 

G12COHRT includes members of the senior cohort determined in the first follow-up (G12COHRT = 1) as well as those whose 
membership status was determined in the second follow-up (G12COHRT = 2). 
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(2) Estimates based on third follow-up data in combination with base-year data (or other 
prior rounds of data) that represent students enrolled in 10th grade in the spring of 
2002 (population A). These weights are F3BYPNLWT and F3BYTSCWT. They are 
described in more detail in section 6. 2.2. 2. 

(3) Estimates based on third follow-up data in combination with first follow-up data (or 
second follow-up data) where the estimates are meant to represent students enrolled 
in 10th grade in the spring of 2002 (population A) or students enrolled in 12th grade 
in the spring of 2004 (population B). These weights are F3F1PNFWT and 
F3F1TSCWT. They are described in more detail in section 6. 2.2. 3. 

These weights and the types of analyses that may be conducted using these weights are 
described below. Although third follow-up student weights were created, no third follow-up 
school weights were created. Discussion of school weights in the context of the EES:2002 third 
follow-up may be found in section 6. 2.2. 4. 

6.2.2. 1 F3QWT AND F3QTSCWT Analytic Weights 

F3QWT and F3QTSCWT were constructed to support analysis of third follow-up data 
alone (F3QWT) or in conjunction with high school (first follow-up) transcript data 
(F3QTSCWT). As noted in section 6.2.1 (figure 7), third follow-up respondents may be in the 
population of Spring 2002 lOth-graders, may be in the population of Spring 2004 12th-graders, 
or may be in both populations. Analyses designed to assess characteristics of one of the 
populations must take care to restrict analyses to those third follow-up respondents in the 
population of interest. To identify those third follow-up respondents who are members of the two 
student populations, two flag variables, G10COHRT and G12COHRT, are available. Those third 
follow-up respondents with a value of 1 for G10COHRT are members of the population of 
Spring 2002 lOth-graders. Those third follow-up respondents with a value of 1 (detennined in 
the first follow-up) or a value of 2 (detennined in the second follow-up) for G12COHRT are 
members of the population of Spring 2004 12th-graders. 

• F3QWT when used in conjunction with G10COHRT supports estimates from third 
follow-up survey data representative of students enrolled in 10th grade in Spring 2002. 

• F3QWT when used in conjunction with G12COHRT supports estimates from third 
follow-up survey data representative of students enrolled in 12th grade in Spring 2002. 

• F3QTSCWT when used in conjunction with G10COHRT supports estimates from third 
follow-up survey data in combination with high school transcript data representative of 
students enrolled in 10th grade in Spring 2002. 

• F3QTSCWT when used in conjunction with G12COHRT supports estimates from third 
follow-up survey data in combination with high school transcript data representative of 
students enrolled in 12th grade in Spring 2004. 
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If the analysis includes third follow-up data in combination with prior-round survey or 
assessment data (other than simply high school transcript data), researchers should consult 
sections 6. 2.2. 2, 6.2. 2. 3, and 7.4 to guide selection of an appropriate weight. 

6.2. 2.2 F3BYPNLWT and F3BYTSCWT Analytic Weights 

F3BYPNLWTand F3BYTSCWT were constructed to support estimates of students 
enrolled in 10th grade in Spring 2002. F3BYPNLWT was produced for all ELS:2002 sample 
members who responded - in the base year and in the third follow-up. F3BYTSCWT was 
produced for cases who responded in the base year and in the third follow-up and have high 
school (first follow-up) transcript data on the file. It is not necessary to use the flag variable 
GIOCOHRT in conjunction with F3BYPNLWTor F3BYTSCWT; by definition, only third 
follow-up respondents who are members of the Spring 2002 lOth-grade population will have a 
non-zero value for these weights. 

As longitudinal studies progress, the set of possible weights dramatically expands. 
Analytic weights produce estimates representative of specific populations and also adjust for 
patterns of potential differential nonresponse to survey components. F3BYPNLWT and 
F3BYTSCWT are intended to support analysis of third follow-up data in combination with base- 
year data (F3BYNLWT) and high school transcript data (F3BYTSCWT). For analysis with data 
from the aforementioned components and any other combination of first follow-up and second 
follow-up data, selection of an appropriate analysis weight requires careful consideration of the 
analyses to be conducted, the data to be used, and the pattern of item nonresponse. For additional 
information on weight selection, researchers should also review section 7.4 of this 
documentation. 

6.2. 2.3 F3F1PNLWT and F3F1TSCWT Analytic Weights 

Largely, F3F1PNLWT and F3F1TSCWT were constructed to support estimates of Spring 
2004 12th-graders. However, these weights may also be used to support estimates of Spring 2002 
lOth-graders, depending on the data in the analysis and the researcher question. F3F1PNLWT 
was produced for all ELS:2002 sample members who responded in the first follow-up and in the 
third follow-up. F3F1TSCWT was produced for cases who responded in the first follow-up and 
in the third follow-up and have high school (first follow-up) transcript data on the file. 


28 Sample members who did not respond in the base year but did respond in the first follow-up were given a new participant 
supplement questionnaire to gather some of the same information that was collected on base-year respondents. Consequently, 
these base-year nonrespondents who responded in the first follow-up were treated as base-year respondents in the construction of 
first and second follow-up longitudinal weights. These sample members were treated as base-year respondents in the construction 
of third follow-up longitudinal weights. 

29 

It is possible that statistical software not designed for the analysis of sample survey data may fail to exclude records that have 
analysis weights of zero. The GIOCOHRT flag may be used to specifically restrict analyses to members of the lOth-grade cohort 
to avoid such a situation from arising. 
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As with F3QWT and F3QTSCWT these weights need to be used in concert with a cohort 
flag. Again, to identify those third follow-up respondents who are members of the two student 
populations, two flag variables, GIOCOHRT and G12COHRT, are available. Those third follow- 
up respondents with a value of 1 for GIOCOHRT are members of the population of Spring 2002 
lOth-graders. Those third follow-up respondents with a value of 1 (detennined in the first follow- 
up) or a value of 2 (determined in the second follow-up) for G12COHRT are members of the 
population of Spring 2004 12th-graders. In other words, third follow-up weights for the 12th- 
grade cohort are constructed for all sample members where G12COHRT equals 1 or 2, whereas 
first follow-up weights are constructed for all sample members where G12COHRT equals 1. 
Consequently, the 12th- grade cohort is identified by restricting sample members to those 
individuals with a value of 1 or 2 for G12COHORT when using third follow-up weights. The 
12th-grade cohort is identified by restricting sample members to those individuals with a value of 
1 for G12COHRT when using first follow-up weights. 

• F3F1PNLWT when used in conjunction with GIOCOHRT supports estimates from the 
third follow-up survey in combination with first follow-up data representative of students 
enrolled in 10th grade in Spring 2002. 30 

• F3F1PNLWT when used in conjunction with G12COHRT supports estimates from the 
third follow-up survey in combination with first follow-up data representative of students 
enrolled in 12th grade in Spring 2004. 

• F3F1TSCWT when used in conjunction with GIOCOHRT supports estimates from the 
third follow-up survey data in combination with first follow-up data and high school 
transcript data representative of students enrolled in 10th grade in Spring 2002. 

• F3F1TSCWT when used in conjunction with G12COHRT supports estimates from the 
third follow-up survey data in combination with first follow-up data and high school 
transcript data representative of students enrolled in 12th grade in Spring 2004. 

Note that these two weights are designed to support analyses that incorporate data from 
the third follow-up, first follow-up, and high school transcript data collection. If an analysis also 
incorporates data from the base year or the second follow-up then these weights may not be 
preferred. For example, F3BYPNLWT and F3BYTSCWT may be preferred when base-year data 
are included in an analysis. Additional guidance on weight selection is provided in section 7.4. 

Again, as longitudinal studies progress, the set of possible weights dramatically expands. 
Analytic weights produce estimates representative of specific populations and also adjust for 
patterns of potential differential nonresponse to survey components. F3F1PNLWT and 
F3F1TSCWT are intended to support analysis of third follow-up data in combination with first 
follow-up data (F3F1PNLWT) and high school transcript data (F3F1TSCWT). For analysis with 


30 Please note that the use of multiple student questionnaires in the first follow-up gives rise to component skips that could result 
in a particular analysis excluding members of the lOth-grade cohort. For example, analysis of dropouts as of the first follow-up 
necessarily excludes other members of the lOth-grade cohort. 
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data from the aforementioned components and in combination with second follow-up data or 
base-year data, selection of an appropriate analysis weight requires careful consideration of the 
analyses to be conducted, the data to be used, and the pattern of item nonresponse. For additional 
information on weight selection, researchers should also review section 7.4 of this 
documentation. 

6.2. 2.4 School Weights and the Third Follow-up 

Although there are multiple data points for analysis in ELS:2002, the base-year school 
weight maintains generalizability only vis-a-vis the nation’s schools that contained 10th and 12th 
grades in Spring 2002. No additional school weight has been produced. Although additional 
infonnation was collected from most of the base-year schools in the first follow-up, these data 
were intended to be largely contextual, an extension of the student record, and by no means are 
the first follow-up school data generalizable to the nation’s schools in 2004. Nevertheless, one 
can look at the ELS:2002 base-year high schools 2 years later, using the base-year school weight 
as an anchor point from which to analyze change, or use student data collected in the first, 
second, or third follow-ups, in conjunction with analysis of the base-year schools, for which a 
proper weight has been provided. 

6.2. 2.5 Third Follow-up Weights and Prior-round Weights 

In both the base year and first follow-up of ELS:2002, some sample members were not 
able to complete the sample member questionnaires because of limited English proficiency or 
because of physical or mental limitations. However, information was collected from individuals, 
such as school administrators and teachers, associated with these sample members. In the base 
year and first follow-up, the set of respondents in each round combined with the set of sample 
members who were questionnaire-incapable in that round was referred to as the expanded sample 
for that round. Analysis weights were created just for those sample members who were 
respondents and were created for the expanded sample. Expanded sample weights are only 
included in restricted-use files. 

In the second and third follow-ups, however, questionnaire-incapable sample members 
participating in the respective follow-up were considered out of scope for constructing weights 
associated with that follow-up. For example, all questionnaire-incapable sample members in the 
third follow-up were excluded from the third follow-up weights. Third follow-up respondents 
who were questionnaire-incapable in prior rounds are included in third follow-up weights as long 
as they responded in the third follow-up. 

Table 21 summarizes the ELS:2002 analysis weights and associated universe flags, 
populations (described in section 6.2.1), and respondents. Additional guidance on selecting 
weights for analysis purposes is provided in section 7.4. 
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Table 21. Relationship among weights, universe flags, populations, and respondents: 
2002-2012 

Weight 

Universe flag 

Population 


Respondent 

BYSTUWT 

GIOCOHRT 

A — Spring 2002 

lOth-grader 

Fully or partially completed questionnaire in 
2002 

BYEXPWT 

GIOCOHRT 

A — Spring 2002 

lOth-grader 

Fully or partially completed questionnaire in 
2002 or were incapable of completing a 
questionnaire 

F1PNLWT 

GIOCOHRT 

A — Spring 2002 

lOth-grader 

Fully or partially completed questionnaire in 
2002 and 2004 (base-year data may be 
from the new participant supplement 1 or 
imputed) 

F1XPNLWT 

GIOCOHRT 

A — Spring 2002 

lOth-grader 

Fully or partially completed questionnaire in 
2002 and 2004 (base-year data may be 
from the new participant supplement or 
imputed) or were incapable of 
completing a questionnaire in 2002 or 
2004 

F1QWT 

GIOCOHRT 1 

G12COHRT 

A — Spring 2002 
B — Spring 2004 

lOth-grader 

12th-grader 

Fully or partially completed questionnaire in 
2004 

F1EXPWT 

GIOCOHRT 

G12COHRT 

A — Spring 2002 
B — Spring 2004 

lOth-grader 

12th-grader 

Fully or partially completed questionnaire in 
2004 or were incapable of completing a 
questionnaire in 2004 

F1TRSCWT 

GIOCOHRT 

G12COHRT 

A — Spring 2002 
B — Spring 2004 

lOth-grader 

12th-grader 

Fully or partially complete high school 
transcript data and fully or partially 
completed first follow-up or base-year 
questionnaire or were members of the 
expanded sample 

F2QWT 

GIOCOHRT 

G12COHRT 

A — Spring 2002 
B — Spring 2004 

lOth-grader 

12th-grader 

Fully or partially completed questionnaire in 
2006 

F2QTSCWT 

GIOCOHRT 

G12COHRT 

A — Spring 2002 
B — Spring 2004 

lOth-grader 

12th-grader 

Fully or partially completed questionnaire in 
2006 and fully or partially complete high 
school transcript data 

F2F1WT 

GIOCOHRT 

G12COHRT 

A — Spring 2002 
B — Spring 2004 

lOth-grader 

12th-grader 

Fully or partially completed questionnaire in 
2004 and 2006 or were incapable of 
completing a questionnaire in 2004 and 
fully or partially completed questionnaire 
in 2006 

F2BYWT 

GIOCOHRT 

A — Spring 2002 

lOth-grader 

Fully or partially completed questionnaire in 
2002 and 2006 or were incapable of 
completing a questionnaire in 2002 and 
fully or partially completed questionnaire 
in 2006 

F3QWT 

GIOCOHRT 

G12COHRT 

A — Spring 2002 
B — Spring 2004 

lOth-grader 

12th-grader 

Fully or partially completed questionnaire in 
2012 

F3QTSCWT 

GIOCOHRT 

G12COHRT 

A — Spring 2002 
B — Spring 2004 

lOth-grader 

12th-grader 

Fully or partially completed questionnaire in 
2012 and fully or partially complete high 
school transcript data 

F3BYPNLWT 

GIOCOHRT 

A — Spring 2002 

lOth-grader 

Fully or partially completed questionnaire in 
2002 and 2012 or were incapable of 
completing a questionnaire in 2002 and 
fully or partially completed questionnaire 
in 2012 


See notes at end of table. 
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Table 21. Relationship among weights, universe flags, populations, and respondents: 
2002-2012— Continued 


Weight 

F3BYPNLWT 


F3BYTSCWT 


F3F1PNLWT 


F3F1TSCWT 


Universe flag 

GIOCOHRT 


GIOCOHRT 


GIOCOHRT 

G12COHRT 


GIOCOHRT 

G12COHRT 


Population 

A — Spring 2002 lOth-grader 


A — Spring 2002 lOth-grader 


A — Spring 2002 lOth-grader 
B — Spring 2004 12th-grader 


A — Spring 2002 lOth-grader 
B — Spring 2004 12th-grader 


Respondent 

Fully or partially completed questionnaire in 
2002 and 2012 or were incapable of 
completing a questionnaire in 2002 and 
fully or partially completed questionnaire 
in 2012 

Fully or partially completed questionnaire in 
2002 and 2012 or were incapable of 
completing a questionnaire in 2002 and 
fully or partially completed questionnaire 
in 2012 and fully or partially complete 
high school transcript data 

Fully or partially completed questionnaire in 
2004 and 2012 or were incapable of 
completing a questionnaire in 2004 and 
fully or partially completed questionnaire 
in 2012 

Fully or partially completed questionnaire in 
2004 and 2012 or were incapable of 
completing a questionnaire in 2004 and 
fully or partially completed questionnaire 
in 2012 and fully or partially complete 
high school transcript data 


1 As noted in the ELS:2002 base-year to first follow-up data file documentation, base-year nonrespondents who responded in the 
first follow-up are considered to be members of the Spring 2002 lOth-grade population, but there is no base-year weight (BYSTUWT 
or BYEXPWT) for them. The new participant supplement employed in the first follow-up ensured that the standard classification 
variables collected in the base year were also available for this group. However, key variables were imputed for base-year 
nonrespondents who were first follow-up respondents, so that these students could be analyzed as part of the sophomore cohort 
using F1PNLWT or F1XPNLWT. These students who are third follow-up respondents may also be analyzed as part of the 
sophomore cohort using F3F1PNLWT. 

NOTE: First follow-up sample freshening resulted in the inclusion of students who were members of the 1 2th-grade cohort but not 
the lOth-grade cohort so the GIOCOHRT flag is required to restrict analyses to those first follow-up respondents who are members 
of the lOth-grade cohort. Similarly, the G12COHRT flag is required to restrict analyses to those first follow-up respondents who are 
members of the 12th-grade cohort. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Base Year to Third Follow-up, 2012. 


6.2.3 Overview of Nonresponse and Calibration Methodology 

All third follow-up analysis weights were created by applying a variety of weight 
adjustments to the third follow-up base weight (discussed in section 6.2.4). These weight 
adjustments were designed to account for three issues: 

• Some ELS:2002 sample members were not fielded for the third follow-up. 

• Some of the ELS:2002 sample members fielded for the third follow-up did not respond. 

• Application of weight adjustments to account for the first two issues resulted in weight 
sums for key analysis domains that differed from prior-round weight sums. 

The weight adjustments associated with the first and second issue are known as 
nonresponse adjustments. Two types of nonresponse occurring during third follow-up data 
collection were considered: nonresponse arising from the inability to locate or contact a sample 
member and nonresponse arising from sample member refusal to participate once contacted. 
After examining the number of nonresponse cases occurring because of refusal and the number 
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occurring because of inability to locate or contact, a determination was made to treat all 
nonrespondents as one group because the predominant reason for nonresponse was refusal. 

The weight adjustments associated with the third issue are known as poststratification or 
calibration adjustments. Because the ELS:2002 third follow-up sample weights are not adjusted 
to sum to population totals, the adjustments associated with the third issue are referred to as 
calibration adjustments. 

In addition to the nonresponse and calibration adjustments described above, the third 
follow-up high school transcript weights (F3QTSCWT, F3BYTSCWT, and F3F1TSCWT) 
included a nonresponse adjustments followed by a subsequent calibration adjustment; the first 
adjustment accounted for nonresponse arising from the student’s school refusing to provide a 
transcript or the student refusal to allow the transcript infonnation to be included with the 
ELS:2002 data, and the second adjustment calibrated weight sums to prior-round totals. 

Although there are several methods that may be used to adjust sampling weights to 
account for nonresponse and to calibrate weight sums, the method used to create the ELS:2002 
third follow-up analysis weights follows a model-based approach. An overview of this model- 
based approach is given below. Specific details of the nonresponse and calibration adjustments 
applied to produce the third follow-up analysis weights may be found in sections 6.2.5 and 6.2.6. 

6.2. 3.1 Generalized Exponential Model 

All nonresponse and calibration adjustments were calculated using RTFs proprietary 
generalized exponential modeling procedure (GEM) (Folsom and Singh 2000), which is similar 
to logistic modeling with bounds for adjustment factors. 

The GEM approach is a general version of weighting adjustments and was based on a 
generalization of Deville and Samdal’s (1992) logit model. GEM is not a competing method to 
weighting classes or logistic regression; rather, it is a method of creating weight adjustments that 
provides a wide variety of features and options that may be employed. It is a fonnalization of 
weighting procedures such as nonresponse adjustment, poststratification, and weight trimming. 

For nonresponse adjustments, GEM controls at the margins as opposed to controlling at 
the cell level, as weighting class adjustments. This approach allows more variables to be 
considered. GEM is designed so that the sum of the unadjusted weights for all eligible units 
equals the sum of the adjusted weights for respondents. 

Extreme weights occur in the ELS:2002 data because of small probabilities of sample 
selection or because of weight adjustments. These extreme weights (either very small or very 
large) can significantly increase the variance of estimates. One way to account for this and 


11 Poststratification typically refers to the process of adjusting sample weights so that the weights sum to population totals 
derived from sources external to the sample of interest. Calibration is used to denote adjusting weight sums to sum to prior-round 
totals. 
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decrease the variance is to trim and smooth extreme weights within prespecified domains. Note 
that trimming weights has the potential to increase bias. However, the increase in bias is often 
offset by the decrease in variance because of weight trimming. As a result, this reduces the mean 
square error (MSE) of an estimate, defined as variance plus bias squared. 

The innovation introduced in GEM is the ability to incorporate specific lower and upper 
bounds. An important application of this feature is to identify at each adjustment step an initial 
set of cases with extreme weights and to use specific bounds to exercise control over the final 
adjusted weights. Thus, there is built-in control for extreme weights in GEM. 

GEM uses the median +/- X * IQR to identify extreme weights, where X is any number, 
typically between 2 and 3, and IQR is the interquartile range. There are also different points in 
the weight adjustment process during which weight trimming can occur. GEM has options to 
make adjustments for extreme weights as part of the nonresponse adjustment process and as part 
of the poststratification process. GEM adjusted for ELS:2002 third follow-up extreme weights 
during both nonresponse adjustment and during calibration For GEM, a variable or set of 
variables is selected to be used to identify extreme weights within each level of the variable(s), 
and the variables race and school type were chosen. Prior to running GEM, the unweighted and 
weighted percentage of extreme weights was examined for all four levels of race crossed with 
the three levels of school type using various multiples of the IQR (2.0, 2.1, 2.2,... 4.0) and IQR 
multipliers were selected for use in the subsequent weight trimming processes. 

6.2. 3.2 Predictor Variables for Nonresponse Models 

To create weight adjustments that account for nonresponse, predictor variables must be 
incorporated into the modeling process. Because the modeling process uses both respondents and 
nonrespondents, the information included in the nonresponse models must be known for both 
respondents and nonrespondents. 

The third follow-up respondents include individuals who were base-year nonrespondents, 
individuals who were first follow-up nonrespondents, and individuals who were second follow- 
up nonrespondents. Consequently, most infonnation collected as part of the base-year, first 
follow-up, and second follow-up surveys could not be used in the nonresponse adjustments. The 
variables used in the nonresponse models primarily consisted of sampling frame information, 
base-year sample school information, and some demographic characteristics. Table 22 lists all 
information used in at least one of the nonresponse models created for the third follow-up. 

All school-level information was included in every nonresponse model and was only 
removed, where necessary, from those models to ensure model convergence required because of 
the iterative nature of the model-fitting process. Because the student-level infonnation was not 
available for all third follow-up sample members, some information was used in some models 
but not in others. Details of the student-level information used in the various nonresponse models 
may be found in sections 6.2.5 and 6.2.6. 
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Table 22. Information used in third follow-up nonresponse models 

School-level information 

Student-level information 

School type 

Student race/ethnicity 

Metropolitan status 

Student sex 

Region 

Student’s native language 

lOth-grade enrollment 

Family composition 

Total enrollment 

Parents’ highest level of education 

Number of minutes per class 

Mother/female guardian’s occupation 

Number of class periods 

Father/male guardian’s occupation 

Number of school days 

Total family income from all sources 

Percent of students receiving free or reduced-price lunch 

Socioeconomic status (SES) 

Number of full-time teachers 

G1 OCOHRT — member of the sophomore cohort 

Percent of full-time teachers certified 

G12COHRT — member of the senior cohort 

Number of part-time teachers 
Number of grades taught at the school 
School level 
Coeducational status 
Percent of students with an IEP 
Percent of students with an LEP 
Percent Hispanic lOth-grade students 
Percent Asian lOth-grade students 
Percent Black lOth-grade students 

Enrollment status 


NOTE: School-level information is from the base year (2002). IEP = individualized education program. LEP = limited English 
proficiency. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Base Year, 2002, First Foliow-up, 2004, Second Follow-up, 2006, and Third Foliow-up, 2012. 

6.2. 3.3 Chi-squared Automatic Interaction Detection for Nonresponse Models 

For those nonresponse adjustments that included interactions of the items listed in 
table 22, Chi-squared automatic interaction detection analysis (CHAID) was performed on the 
predictor variables to detect important interactions for the logistic models used to produce 
nonresponse weight adjustment factors. The CHAID analysis divided the data into segments that 
differed with respect to the response variable (fielded, did not refuse, or respondent, depending 
on the model). The segmentation process first divided the sample into groups based on categories 
of the most significant predictor of response. It then split each of these groups into smaller 
subgroups based on other predictor variables. It also merged categories of a variable that were 
found to be insignificant. The splitting and merging process continued until no more statistically 
significant predictors were found or until some other stopping rule was met. The interactions 
from the final CHAID segments were then defined. 

The interaction segments and all main effects were subjected to variable screening in the 
GEM logistic procedure. The initial model for a given adjustment step included all of the 
variables listed in table 22 that were available for respondents and nonrespondents and, where 
interaction terms were used, included the segments identified via CHAID. The most insignificant 
variables were deleted sequentially until the deletion of additional variables did not appreciably 
improve the unequal weight effect (UWE). Different bounds on the weight adjustments, 
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depending on whether the weights were classified as extreme, were used to implement 
nonresponse adjustment, truncation, and smoothing in one step. 

6.2.4 Base Weight 

The base weight used to produce each of the third follow-up analysis weights was the 
second follow-up design weight, F2DWT. 32 The second follow-up design weight incorporates a 
simple ratio adjustment to account for unknown eligibility of some ELS:2002 sample members. 
Additional infonnation about the adjustment for unknown eligibility may be found in the 
ELS:2002 Base-Year to Second Follow-up Data File Documentation (NCES 2008-347). All third 
follow-up analysis weights were produced by applying a series of nonresponse and calibration 
adjustments to F2DWT. 

Subsequent adjustments to F2DWT varied by third follow-up analysis weight. The 
nonresponse and calibration adjustments applied to F2DWT to produce the third follow-up 
weights are described in section 6.2.5. Figure 8 summarizes the weight adjustments applied to 
the second follow-up design weight to produce the six third follow-up analysis weights. 


32 

“ F2DWT is not provided in the third follow-up data files; it is referenced here to facilitate discussion of the weighting 
adjustments used to produce the third follow-up weights. 
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Figure 8. Third follow-up weight adjustments 



SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Base Year, 2002, First Follow-up, 2004, Second Follow-up, 2006, and Third Follow-up, 2012. 


6.2.5 Details of Weight Adjustments 

The nonresponse and calibration adjustments used to produce the F3QWT and 
F3QTSCWT are described in section 6.2.5. 1 and the corresponding adjustments for 
F3BYPNLWT, F3BYTSCWT, F3F1PNLWT, and F3F1TSCWT are discussed in section 6.2. 5.2. 

6.2. 5.1 Weight Adjustments for F3QWT and F3QTSCWT 

The third follow-up weight (F3QWT) was computed for those sample members who fully 
or partially completed the third follow-up questionnaire. As was the case in the second follow- 
up, prior-round questionnaire-incapable sample members who did not respond in the third 
follow-up were considered to be out of scope. 

With a few exceptions, second follow-up eligible sample students remained eligible for 
the third follow-up sample. Deceased students were out of scope for the third follow-up. Students 
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who left the country, were unavailable for the duration of the study (e.g., in military boot camp), 
or were institutionalized were also out of scope for the third follow-up. 

The third follow-up high school transcript weight (F3QTSCWT) was created by adjusting 
the third follow-up weight F3QWT. Two adjustments were applied to F3QWT. The first 
adjustment was a nonresponse adjustment used to account for those ELS:2002 sample members 
who did not have a high school transcript. The second adjustment was used to calibrate weight 
sums of the high school transcript weight to ensure agreement with weight sums generated with 
F3QWT. 

The tables in appendix F summarize the nonresponse and calibration models used to 
produce the third follow-up weights. There are two types of tables provided in appendix F: 

• nonresponse tables that list variables used in nonresponse models and indicate the 
number of respondents, the weighted response rate, and the average weight adjustment by 
each level of each predictor variable; and 

• calibration tables that list variables used in calibration models and indicate the control 
total and average weight adjustment by each level of each variable used in the calibration 
model. 

The list of variables (main effects and interactions) used in the student nonresponse 
adjustment for F3QWT may be found in table F-l (appendix F) while the variables used in the 
subsequent calibration model are listed in table F-2. The nonresponse model applied to F3QWT 
as part of the process to produce F3QTSCWT is summarized in table F-3 and the subsequent 
calibration model is summarized in table F-4. 

Table 23 shows various statistical properties of the final third follow-up weights F3QWT 
and F3QTSCWT. 


82 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 




Chapter 6. Weighting , Imputation , Bias Assessment , and Design Effects 


Table 23. Statistical properties of F3QWT and F3TSCWT weights: 2012 


Weight 

F3QWT 

F3QWT 

lOth-grade 

cohort 

F3QWT 

12th-grade 

cohort 

F3QTSCWT 

F3QTSCWT 

lOth-grade 

cohort 

F3QTSCWT 

12th-grade 

cohort 

Mean 

248.1 

247.4 

244.1 

276.7 

276.0 

267.7 

Variance 

27,829. 

4 

27,642.3 

28,130.6 

41,050.9 

40,864.1 

38,317.6 

Standard deviation 

166.8 

166.3 

167.7 

202.6 

202.1 

195.7 

Coefficient of variation (x 100) 

67.2 

67.2 

68.7 

73.2 

73.2 

73.1 

Minimum 

5.2 

5.2 

5.2 

5.3 

5.3 

5.3 

Maximum 

903.0 

861.5 

903.0 

2,290.0 

2290.0 

2290.0 

Skewness 

0.9 

0.9 

0.9 

1.5 

1.5 

1.4 

Kurtosis 

0.4 

0.4 

0.5 

4.1 

4.2 

3.4 

Sum 

3,287,0 

51 

3,248,820 

2,844,923 1 

3,287,051 

3,248,820 

2,844,241 

Number of cases 

13,250 

13,133 

11,656 

11,881 

11,770 

10,624 


1 Control totals and cohort status used to construct third follow-up weights match corresponding second follow-up values while the 
cohort status used to produce this table incorporates changes for a handful of cases, that result in a slight increase in the sum of the 
weights. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Base Year, 2002, First Follow-up, 2004, Second Follow-up, 2006, and Third Follow-up, 2012. 

6.2. 5.2 Weight Adjustments for F3BYPNLWT, F3BYTSCWT, F3F1PNLWT, and 
F3F1TSCWT 

F3BYPNLWT was calculated for all sample members who: 

• fully or partially completed a third follow-up questionnaire and fully or partially 
completed a base-year questionnaire; or 

• fully or partially completed a third follow-up questionnaire and were questionnaire- 
incapable in the base year. 

Sample members who were questionnaire-incapable in the base year were treated as base- 
year respondents for the purposes of constructing F3BYPNLWT. Questionnaire-incapable 
sample members who did not respond in the third follow-up were considered to be out of scope. 

The nonresponse adjustments used to create F3BYPNLWT accounted for the same two 
nonresponse mechanisms used in the adjustment process used to create F3QWT as described in 
section 6.2.5. 1. The list of variables (main effects and interactions) used in the student 
nonresponse adjustment for F3BYPNLWT may be found in table F-5 (appendix F) while the 
variables used in the subsequent calibration model are listed in table F-6. 

F3BYTSCWT was calculated for all sample members who: 

• fully or partially completed a third follow-up questionnaire and fully or partially 
completed a base-year questionnaire and have fully or partially complete high school 
transcript data; or 

• fully or partially completed a third follow-up questionnaire and were questionnaire- 
incapable in the base year and have fully or partially complete high school transcript data. 
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The third follow-up high school transcript weight (F3BYTSCWT) was created by 
adjusting the third follow-up weight F3BYPNLWT. Two adjustments were applied to 
F3BYPNLWT. The first adjustment was a nonresponse adjustment used to account for those 
ELS:2002 sample members who did not have a high school transcript because a school refused 
to provide the transcript or those sample members who refused to allow their transcript to be 
collected. The second adjustment was used to calibrate weight sums of the transcript weight to 
ensure agreement with weight sums generated with F3QWT for the 10th- grade cohort. The 
nonresponse model applied to F3BYPNLWT as part of the process to produce F3BYTSCWT is 
summarized in table F-7 and the subsequent calibration model is summarized in table F-8. 

F3F1PNLWT was calculated for all sample members who: 

• fully or partially completed a third follow-up questionnaire and fully or partially 
completed a first follow-up questionnaire; or 

• fully or partially completed a third follow-up questionnaire and were questionnaire 
incapable in the first follow-up. 

Sample members who were questionnaire-incapable in the first follow-up were treated as 
first follow-up respondents for the purposes of constructing F3F1PNLWT. Questionnaire- 
incapable sample members who did not respond in the third follow-up were considered to be out 
of scope. 

The nonresponse adjustments used to create F3F1PNLWT accounted for the same 
nonresponse mechanisms used in the adjustment process to create F3QWT as described in 
section 6.2.5. 1. The list of variables (main effects and interactions) used in the student 
nonresponse adjustment for F3F1PNLWT may be found in table F-9 (appendix F) while the 
variables used in the subsequent calibration model are listed in table F-10. 

F3F1TSCWT was calculated for all sample members who: 

• fully or partially completed a third follow-up questionnaire and fully or partially 
completed a first follow-up questionnaire and have full or partial high school transcript 
data; or 

• fully or partially completed a third follow-up questionnaire and were questionnaire 
incapable in the first follow-up and have full or partial high school transcript data. 

The third follow-up high school transcript weight (F3F1TSCWT) was created by adjusting the 
third follow-up weight F3F1PNLWT. Two adjustments were applied to F3F1PNLWT. The first 
adjustment was a nonresponse adjustment used to account for those ELS:2002 sample members 
who did not have a high school transcript because a school refused to provide the transcript or 
those sample members who refused to allow their transcript to be collected. The second 
adjustment was used to calibrate weight sums of the transcript weight to ensure agreement with 
weight sums generated with F3QWT. The nonresponse model applied to F3F1PNLWT as part of 
the process to produce F3F1TSCWT is summarized in table F-l 1 and the subsequent calibration 
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model is summarized in table F-12. Table 24 shows various statistical properties of the final third 
follow-up weights. 
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Table 24. Statistical properties of F3BYPNLWT, F3BYTSCWT, F3F1PNLWT, and F3F1TSCWT weights: 2012 


Weight 

F3BYPNLWT 

F3BYTSCWT 

F3F1PNLWT 

F3F1PNLWT 

lOth-grade 

cohort 

F3F1PNLWT 

12th-grade 

cohort 

F3F1TSCWT 

F3F1TSCWT 

lOth-grade 

cohort 

F3F1TSCWT 

12th-grade 

cohort 

Mean 

247.4 

276.0 

264.1 

263.6 

254.2 

291.6 

291.0 

277.0 

Variance 

27,591.5 

41,469.1 

33,789.4 

33,659.2 

31,565.9 

49,220.5 

49,098.9 

43,383.2 

Standard deviation 

166.1 

203.6 

183.8 

183.5 

177.7 

221.9 

221.6 

208.3 

Coefficient of variation (x 100) 

67.1 

73.8 

69.6 

69.6 

69.9 

76.1 

76.1 

75.2 

Minimum 

5.2 

5.2 

5.2 

5.2 

5.3 

5.4 

5.4 

5.4 

Maximum 

837.8 

2,163.2 

3,073.6 

3,073.6 

959.0 

2,509.0 

2,509.0 

2,509.0 

Skewness 

0.9 

1.4 

1.2 

1.2 

1.0 

1.7 

1.7 

1.6 

Kurtosis 

0.3 

3.8 

4.7 

4.8 

0.7 

5.4 

5.4 

4.9 

Sum 

3,248,820 

3,248,820 

3,287,051 

3,248,820 

2,844,241 

3,287,051 

3,248,820 

2,844,241 

Number of cases 

13,132 

11,769 

12,444 

12,327 

11,188 

11,274 

11,163 

10,268 


SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 (ELS:2002), Base Year, 2002, First Follow-up, 2004, Second 
Follow-up, 2006, and Third Follow-up, 2012. 
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6.2.6 BRR Weights 

Six sets of 200 BRR replicate weights were computed to support a variance estimation 
procedure in addition to Taylor series variance estimation supported by the weights described in 
section 6.2.2. The six sets correspond to the six weights described in section 6.2.2: 

• F3Q001 — F3Q200; corresponds to F3QWT 

• F3TRS00 1 — F3TRS200; corresponds to F3QTSCWT 

• F3BYP00 1 — F3BYP200; corresponds to F3BYNPLWT 

• F3BYT001 — F3BYT200; corresponds to F3BYTSCWT 

• F3F1P001 — F3F1P200; corresponds to F3F1PNLWT 

• F3F1T001 — F3F1T200; corresponds to F3F1TSCWT 

The third follow-up replicate weights were computed in a similar manner to those 
computed for the base year, first, and second follow-up. 

The BRR procedure is a variance estimation procedure that computes the variance based 
on a balanced set of pseudoreplicates. The BRR variance estimation process involves modeling 
the design as if it were a two primary sampling unit (PSU) per stratum design. Variances were 
calculated using a replication variance estimation procedure, with a balanced set of 200 
replicates as the groups. Balancing was done by using an orthogonal matrix (200 x 200 
Hadamard matrix) and allows the use of less than the full set of 2 L possible replicates, where L is 
the number of analysis strata. To achieve full orthogonal balance, the number of BRR strata 
needs to be less than the number of replicates. Therefore, 200 replicates were created in 199 
strata. Section 6.2.6. 1 describes the strata and PSUs (replicates) that were created for the base- 
year replicate weights and used again in the first, second, and third follow-ups. Section 6.2. 6.2 
describes the weight adjustments made for the third follow-up. 

6.2. 6.1 Strata and Primary Sampling Units for Balanced Repeated Replicate Weights 

For Taylor series variance estimation, 361 analysis strata containing responding schools 
were created from the 96 sampling strata based on the sample design. To replicate the school 
weight, it is necessary for the BRR strata to contain all sample schools (respondents and 
nonrespondents). For the base year, 594 analysis strata were fonned for the purpose of 
computing school-level Taylor series variance estimates. These 594 analysis strata were 
collapsed into 199 BRR strata. The base-year expected sample size for each sample school in the 
594 strata was estimated, and then strata were collapsed randomly across size groups (small, 
medium, large) so that the 199 strata have approximately equal sizes. Collapsing randomly 
allows schools of different types, regions, or urbanicities to be together in a stratum. This 
provides more degrees of freedom for variance estimation for domains and helps obtain more 
accurate variance estimates within domains. Within each of the 199 BRR strata, there are two 
PSUs. Each school in a stratum was randomly assigned to one of the two PSUs. The strata were 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 


87 




Chapter 6. Weighting , Imputation, Bias Assessment, and Design Effects 


randomly assigned to the rows of the Hadamard matrix. The 200 columns of the matrix are the 
replicates. Within each stratum, the matrix contains values of + 1 and -1; one PSU was randomly 
assigned +1 and the other was assigned -1. For PSUs with a value of + 1, the school base 
(sampling) weight was multiplied by 2 to create the initial BRR weight; otherwise the school 
base weight was multiplied by zero. Approximately half of the schools in each of the 200 
replicates have initial BRR weights of zero and the other half have initial BRR weights double 
the initial base weight. 

6.2. 6.2 Balanced Repeated Replicate Weight Adjustments 

Although both Taylor series and BRR variance estimation methods reflect the increase in 
variance because of unequal weighting, the BRR weights can also be designed to reflect the 
variance impact (increase or decrease) of the weight adjustment process. The impact of the 
weight adjustment process is captured by repeating nonresponse adjustment and calibration 
processes on each BRR half-sample. 

The third follow-up replication process mirrored the third follow-up analysis weight 
construction, and the design weight was the second follow-up replicate design weight. All third 
follow-up weight adjustments were replicated, including nonresponse adjustments, and 
calibration. The original third follow-up nonresponse and calibration models were used initially 
for each of the 200 replicates. However, some of the models did not converge for some 
replicates, so variables were deleted one by one from the models until convergence was 
achieved. The variables deleted were those that seemed to be causing the convergence problems, 
as long as they were not key design variables. The weight distribution was calibrated to the third 
follow-up F3QWT weight sums. Because the third follow-up weights were not poststratified to 
external (known) totals, the estimates could legitimately reflect some variation in base- year totals 
because of sampling variability. To recognize the calibration to the third follow-up, each half- 
sample was calibrated to the third follow-up half-sample replicate weight sums rather than 
calibrated to the third follow-up full sample analysis weight sums. 

6.2.7 Quality Control 

Quality control was emphasized on all activities, including weighting. Because of the 
central importance of the analysis weights to population estimation, each set of weights was 
thoroughly checked. The most fundamental type of check was the verification of totals that are 
algebraically equivalent (e.g., marginal totals of the weights of eligible students prior to 
nonresponse adjustment and of respondents after nonresponse adjustment). In addition, various 
analytic properties of the initial weights, the weight adjustment factors, and the final weights 
were examined, both overall and within sampling strata, including 

• distribution of the weights; 

• ratio of the maximum weight divided by the minimum weight; and 

• unequal weighting design effect, or variance inflation effect (1 + CV ). 
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Additionally, two-dimensional tables of before and after weight adjustments were 
reviewed to ensure that the weight distribution was not distorted. 

6.3 Standard Errors and Design Effects 

6.3.1 Standard Errors 

For probability-based sample surveys, most estimates are nonlinear statistics. For 
example, a mean or proportion, which is expressed as Iwy/Iw, is nonlinear because the 
denominator is a survey estimate of the (unknown) population total. In this situation, the 
variances of the estimates cannot be expressed in closed fonn. One common procedure for 
estimating variances of survey statistics is the Taylor series linearization procedure. This 
procedure takes the first-order Taylor series approximation of the nonlinear statistic and then 
substitutes the linear representation into the appropriate variance fonnula based on the sample 
design. Woodruff (1971) presented the mathematical formulation of this procedure. The variance 
estimation must also take into account stratification and clustering. There are other variance 
estimation procedures, such as jackknife and BRR. Taylor series and BRR estimation were 
supported for the base year, first follow-up, and second follow-up and is also supported for the 
third follow-up. 

Variance estimation procedures assumed a with-replacement design at the first stage of 
sampling. Because school sampling rates were moderately low, this assumption yields estimates 
that are only slightly biased in the positive direction. For stratified multistage surveys and a with- 
replacement sample design, the Taylor series procedure requires the specification of analysis 
strata and analysis PSUs. In the base year, 361 analysis strata were fonned from the sampling 
strata used in the first stage of sampling, and the analysis PSUs were the individual schools. 
Given that the school sample was selected using probability with minimum replacement (pmr), 
for variance estimation in the base year, variance estimation strata were formed consisting of two 
PSUs per stratum (Chromy 1981). However, when there were an odd number of schools in a 
sampling stratum, one of the analysis strata formed had three PSUs. The same analysis strata and 
PSUs as in the base year were used in the first follow-up, in the second follow-up, and in the 
third follow-up. 

As described in chapter 3, the ELS:2002 base-year sampling design was a stratified two- 
stage design. A stratified sample of schools was selected with probabilities proportional to a 
composite measure of size at the first stage, and a stratified systematic sample of students was 
selected from sample schools at the second stage. At the first stage, the school sampling rates 


33 

Ew is the estimated population and y is the corresponding value of the variable for which a mean or proportion is calculated. In 
the case of estimation of a proportion, y is a 0/1 variable indicating whether a certain characteristic is present for the sample 
member. In the case of estimation of a mean, y is either a continuous or ordinal value reported by or associated with the sample 
member. 
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varied considerably by school sampling strata. At the second stage, Asian and Hispanic students 
were sampled at higher rates than other students. Because of this complex sampling design, 
statistical analyses should be conducted using software that properly accounts for the complex 
survey design. 

Many commonly used statistical computing packages assume that the data were obtained 
from a simple random sample; that is, they assume that the observations are independent and 
identically distributed. When the data have been collected using a complex sampling design, the 
simple random sampling assumption usually leads to an underestimate of the sampling variance, 
which would lead to artificially small confidence intervals and liberal hypothesis test results (i.e., 
rejecting the null hypothesis when it is in fact true more often than indicated by the nominal 
Type I error level) (Carlson, Johnson, and Cohen 1993). 

Statistical strategies that have been developed to address this issue include first-order 
Taylor series expansion of the variance equation, BRR, and the jackknife approach (Wolter 
2007). Special-purpose software packages that have been developed for analysis of complex 
sample survey data include SUDAAN, WesVar, Stata, SPSS, SAS, R, and AM. Evaluations of 
the relative performances of these packages are reported by Cohen (1997). 

• SUDAAN is a commercial product developed by RTI; information regarding the 
features of this package and its lease terms is available from the website 
http://www.rti.org/sudaan . 

• WesVar is a product of Westat, Inc.; information regarding the features of this 
package and its lease terms is available from the website 

http://www.westat.com/westat/expertise/infonnation systems/wesvar/index.cfm . 

• Infonnation regarding the features of Stata and its lease terms is available from the 
website http://www.stata.com . 

SPSS is a product of IBM; infonnation regarding the features of this package may be 
found on the website http://www- 
0 1 ■ibm.com/software/analvtics/spss/products/statistics . 

• SAS infonnation (SAS-STAT User’s Guide) can be found at 
http://support.sas.com/documentation/ . 

• R information is available at http://www.r-proiect.org . 

• AM software, a product developed by the American Institutes for Research (AIR), 
can be downloaded for free from the website http://am.air.org/ . 

The following pseudo code is an example of generic SUDAAN code used to produce 
estimates and standard errors using Taylor series. The symbols /* and */ in the code indicate the 
beginning and end of a comment. Note that the dataset must be sorted by analysis strata and 
analysis PSUs before analyzing the data in SUDAAN. 
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proc descript data=/* insert filename 34 */ design=wr; 

nest analstr analpsu; /* these variables are the analysis strata and analysis PSUs, 

respectively */ 

weight F3QWT; 

var /*insert variables*/; 

subpopn /* insert domain of interest if domain is a subset of students*/; 

print nsum mean semean ; 

run; 

Corresponding Stata version 9 code is: 
drop all 

set memory 18000 

use “/* insert filename */”, clear 

sort analstr analpsu /* these variables are the analysis strata and analysis PSUs, 
respectively */ 

svyset analpsu [pweight=f3qwt], strata( analstr) 
svy: tab /*insert variables*/, subpop (name of domain) row se 
Earlier versions of Stata require the following syntax: 
svyset [pweight=f3qwt], strata(analstr) psu(analpsu) 
svytab /*insert variables*/, subpop (name of domain) row se 

6.3.2 Design Effects 

The impact of the departures of the ELS:2002 complex sample design from a simple 
random sample design on the precision of sample estimates can be measured by the design 
effect. The design effect is the ratio of the actual variance of the statistic to the variance that 
would have been obtained had the sample been a simple random sample. The design standard 
errors will be different from the standard errors that are based on the assumption that the data are 
from a simple random sample. The ELS:2002 sample departs from the assumption of simple 
random sampling in three major respects: student samples were stratified by student 
characteristics, students were selected with unequal probabilities of selection, and the sample of 
students was clustered by school. A simple random sample is, by contrast, unclustered and not 
stratified. Additionally, in a simple random sample, all members of the population have the same 
probability of selection. Generally, clustering and unequal probabilities of selection increase the 
variance of sample estimates relative to a simple random sample, and stratification decreases the 
variance of estimates. 

Standard errors and design effects were computed for all respondents. Standard errors 
and design effects were computed for 29 means and proportions overall for all respondents and 
for subgroups of all respondents. The subgroups are similar to those used in National Education 
Longitudinal Study of 1988 (NELS:88) and prior rounds of ELS:2002. 


34 All ELS:2002 sample members should be included in the file used with SUDAAN and all other software packages so that 
variance estimates are calculated correctly. 
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• sex (male and female); 

• race/ethnicity (Asian/Pacific Islander, Black, Hispanic, White/other, multiracial); 

• school type (public, Catholic, other private); and 

• socioeconomic status (SES) (lowest quarter, middle two quarters, and highest quarter). 

For purposes of trend comparisons, it is valuable to compare design effects across cohorts 
(e.g., ELS:2002 versus NELS:88), so the NELS:88 Base-Year to Fourth Follow-Up User’s 
Manual (NCES 2002-323) was initially used to help guide the items picked. However, the 
available EES:2002 items differ substantially from the items used in constructing design effects 
for NEES:88 because there were substantial differences in the types and composition of variables 
produced in each study. Nonetheless, the items chosen are a good representation of the different 
items in the EFS:2002 third follow-up survey questionnaire. These items should provide a range 
of design effects that will give a reasonable average for both the entire sample and for 
analytically important subgroups. However, because item matching with NEES:88 was difficult, 
the EFS:2002 design effects may not be comparable with the NEFS:88 repeated design effects. 
Ideally, one would like to compare exact items between surveys. Table 25 lists the 29 items 
chosen for computing design effects for all respondents and subgroups. For categorical variables, 
the item value corresponding to the category of interest is listed. 

For all respondents, the standard errors and design effects were calculated using all six 
third follow-up weights. 
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Table 25. Items chosen for computing design effects for all respondents and subgroups 

Survey item Variable name Item value 1 

First job is working for an employer F3A01A 1 

Working for pay 35 or >35 hours per week F3A03A 1 

Extent to which postsecondary education prepared for life: Work and career F3A15A 1 

Postsecondary education paid with grants/scholarships F3A23 1 

Respondent served in military F3B01 1 

Current employer offers health insurance F3B24 1 

College degree but not advanced degree needed for job at age 30 F3C14 6 

Number of friends or roommates living with respondent F3D1 5A Continuous 

Number of siblings living with respondent F3D15B Continuous 

Voted in 2008 presidential election F3D38 1 

Respondent performed community service in past 2 years F3D40 1 

Volunteered with church-related group F3D41D 1 

Volunteered with school/community organizations F3D41E 1 

Respondent’s parent/guardian divorced in last 2 years F3D44A 1 

Respondent’s parent/guardian lost job in last 2 years F3D44B 1 

Ever attended a postsecondary school F3EVERATT 1 

Ever dropped out F3EVERDO 1 

Ever held a job since leaving high school F3EVRJOB 1 

Fall 2003 — Summer 2004 high school graduate F3HSSTAT 1 

Received GED or other equivalency F3HSSTAT 6 

Respondent’s current marital status is single F3MARRSTATUS 3 

Respondent’s current marital status is married F3MARRSTATUS 1 

At age 30 expects to have a job as a laborer F3ONET2AGE30 37 

At age 30 expects to have a job as a manager F3ONET2AGE30 1 1 

At age 30 expects to have a job in the military F3ONET2AGE30 55 

At age 30 expects to have a professional job (group a) Business and F3ONET2AGE30 

Financial 13 

At age 30 expects to have a sales job F3ONET2AGE30 45 

At age 30 expects to have a job as a school teacher F3ONET2AGE30 25 

Expect to finish college, but not advance degree F3STEXP 6 

1 For categorical variables, the item value corresponds to the category of interest, and for continuous variables, the item value is 
indicated as continuous. 

NOTE: GED = General Educational Development credential. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

Appendix G contains tables of design effects for all respondents. Each table includes the 
survey item (or composite variable), variable name and value, percent estimate, design standard 
error, simple random sample standard error, sample size (n), design effect (DEFF), and square 
root of the design effect (DEFT). Tables 26 to 3 1 summarize the average DEFFs and DEFTs for 
the full sample, panel samples, and high school transcript samples respectively, for all 
respondents, dropouts, and each subgroup. The reader should note that the mean DEFTs reported 
in tables 26 to 3 1 were not calculated directly from the mean DEFF but, rather, are based on the 
summary statistics from the tables in appendix G. 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 


93 







Chapter 6. Weighting , Imputation, Bias Assessment, and Design Effects 


Table 26. Mean design effects (DEFFs) and root design effects (DEFTs) for F3QWT, by selected 
characteristics: 2012 


Characteristic 

Mean design effect 

Mean root design effect 

Total 

1.7 

1.3 

Sex 

Male 

1.5 

1.2 

Female 

1.6 

1.3 

Race/ethnicity 

American Indian or Alaska Native 

1.3 

1.1 

Asian or Pacific Islander 

1.5 

1.2 

Black or African American 

1.5 

1.2 

Hispanic or Latino 

1.4 

1.2 

White 

1.6 

1.3 

More than one race 

1.6 

1.3 

School type 

Public 

1.5 

1.2 

Catholic 

1.4 

1.2 

Other private 

2.3 

1.5 

Socioeconomic status (SES) 

Low SES 

1.4 

1.2 

Middle SES 

1.5 

1.2 

High SES 

1.6 

1.2 


NOTE: The mean DEFT was not calculated directly from the mean DEFF but, rather, is the average DEFT over selected items. See 
appendix G of this document for more information. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Foilow-up, 2012. 


94 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 







Chapter 6. Weighting , Imputation , Bias Assessment , and Design Effects 


Table 27. Mean design effects (DEFFs) and root design effects (DEFTs) for F3BYPNLWT, by 
selected characteristics: 2012 

Characteristic 

Mean design effect 

Mean root design effect 

Total 

1.7 

1.3 

Sex 



Male 

1.5 

1.2 

Female 

1.6 

1.3 

Race/ethnicity 



American Indian or Alaska Native 

1.3 

1.1 

Asian or Pacific Islander 

1.5 

1.2 

Black or African American 

1.5 

1.2 

Hispanic or Latino 

1.4 

1.2 

White 

1.6 

1.3 

More than one race 

1.6 

1.3 

School type 



Public 

1.5 

1.2 

Catholic 

1.4 

1.2 

Other private 

2.3 

1.5 

Socioeconomic status (SES) 



Low SES 

1.4 

1.2 

Middle SES 

1.5 

1.2 

High SES 

1.6 

1.2 


NOTE: The mean DEFT was not calculated directly from the mean DEFF but, rather, is the average DEFT over selected items. See 
appendix G of this document for more information. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 


95 







Chapter 6. Weighting , Imputation, Bias Assessment, and Design Effects 


Table 28. Mean design effects (DEFFs) and root design effects (DEFTs) for F3F1PNLWT, by 
selected characteristics: 2012 

Characteristic 

Mean design effect 

Mean root design effect 

Total 

1.8 

1.3 

Sex 



Male 

1.6 

1.3 

Female 

1.6 

1.3 

Race/ethnicity 



American Indian or Alaska Native 

1.3 

1.1 

Asian or Pacific Islander 

1.5 

1.2 

Black or African American 

1.5 

1.2 

Hispanic or Latino 

1.5 

1.2 

White 

1.6 

1.3 

More than one race 

1.7 

1.3 

School type 



Public 

1.5 

1.2 

Catholic 

1.4 

1.2 

Other private 

4.3 

1.9 

Socioeconomic status (SES) 



Low SES 

1.4 

1.2 

Middle SES 

1.6 

1.3 

High SES 

1.6 

1.2 


NOTE: The mean DEFT was not calculated directly from the mean DEFF but, rather, is the average DEFT over selected items. See 
appendix G of this document for more information. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Foilow-up, 2012. 
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Table 29. Mean design effects (DEFFs) and root design effects (DEFTs) for F3QTSCWT, by 
selected characteristics: 2012 


Characteristic 

Mean design effect 

Mean root design effect 

Total 

1.9 


1.4 

Sex 

Male 

1.7 


1.3 

Female 

1.8 


1.3 

Race/ethnicity 

American Indian or Alaska Native 

1.4 


1.2 

Asian or Pacific Islander 

1.7 


1.3 

Black or African American 

1.6 


1.3 

Hispanic or Latino 

1.6 


1.3 

White 

1.7 


1.3 

More than one race 

1.7 


1.3 

School type 

Public 

1.6 


1.3 

Catholic 

1.5 


1.2 

Other private 

2.7 


1.6 

Socioeconomic status (SES) 

Low SES 

1.5 


1.2 

Middle SES 

1.6 


1.3 

High SES 

1.8 


1.3 


NOTE: The mean DEFT was not calculated directly from the mean DEFF but, rather, is the average DEFT over selected items. See 
appendix G of this document for more information. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Foilow-up, 2012. 
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Table 30. Mean design effects (DEFFs) and root design effects (DEFTs) for F3BYPNLWT, by 
selected characteristics: 2012 

Characteristic 

Mean design effect 

Mean root design effect 

Total 

1.9 

1.4 

Sex 



Male 

1.7 

1.3 

Female 

1.8 

1.3 

Race/ethnicity 



American Indian or Alaska Native 

1.4 

1.2 

Asian or Pacific Islander 

1.7 

1.3 

Black or African American 

1.7 

1.3 

Hispanic or Latino 

1.6 

1.3 

White 

1.7 

1.3 

More than one race 

1.7 

1.3 

School type 



Public 

1.6 

1.3 

Catholic 

1.5 

1.2 

Other private 

2.7 

1.6 

Socioeconomic status (SES) 



Low SES 

1.5 

1.2 

Middle SES 

1.7 

1.3 

High SES 

1.8 

1.3 


NOTE: The mean DEFT was not calculated directly from the mean DEFF but, rather, is the average DEFT over selected items. See 
appendix G of this document for more information. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Foilow-up, 2012. 
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Table 31. Mean design effects (DEFFs) and root design effects (DEFTs) for F3F1TSCWT, by 
selected characteristics: 2012 

Characteristic 

Mean design effect 

Mean root design effect 

Total 

2.0 

1.4 

Sex 



Male 

1.8 

1.3 

Female 

1.8 

1.3 

Race/ethnicity 



American Indian or Alaska Native 

1.5 

1.2 

Asian or Pacific Islander 

1.7 

1.3 

Black or African American 

1.7 

1.3 

Hispanic or Latino 

1.7 

1.3 

White 

1.7 

1.3 

More than one race 

1.8 

1.3 

School type 



Public 

1.7 

1.3 

Catholic 

1.5 

1.2 

Other private 

2.6 

1.6 

Socioeconomic status (SES) 



Low SES 

1.5 

1.2 

Middle SES 

1.7 

1.3 

High SES 

1.8 

1.3 


NOTE: The mean DEFT was not calculated directly from the mean DEFF but, rather, is the average DEFT over selected items. See 
appendix G of this document for more information. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

As discussed in section 6.2, trimming weights reduces the variance, which reduces the 
design effect. Additionally, the items used to compute the mean design effects were different in 
the third follow-up than in prior rounds because the design effects were not expected to change 
much across the four rounds of the study. For purpose of trend comparisons, it is valuable to 
compare design effects across cohorts, as described below, so the items were chosen to be as 
comparable to NELS:88 fourth follow-up items as possible. 

The smaller design effects in ELS:2002 compared with those for the NELS:88 and High 
School and Beyond (HS&B) sophomore cohorts are probably substantially a result of smaller 
degree of subsampling in ELS:2002. Although no subsampling was used for the ELS:2002 third 
follow-up, sample base-year nonrespondents were subsampled for inclusion in the first follow- 
up. In NELS:88, subsampling was performed in the first, third, and fourth follow-ups. See Curtin 
et al. (2002) for more details. In HS&B, sophomore cohort members were subsampled for 
inclusion in the HS&B high school transcript study and this subsample was the basis for the 
HS&B second follow-up study. See Zahs et al. (1995) for more details. As mentioned above, the 
general tendency in longitudinal studies is for design effects to lessen over time, because 
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dispersion reduces the original clustering. Subsampling increases design effects because it 
introduces additional variability into the weights with an attendant loss in sample efficiency. 

The smaller design effects in ELS:2002 compared with those for the HS&B sophomore 
cohort also may reflect the somewhat smaller cluster size used in the latter survey in the base 
year. Although the clusters were reduced somewhat in the first follow-up for both studies, a 
number of students remained in the base-year school. The HS&B base-year sample design called 
for 36 sophomores selected from each school. The ELS:2002 sample design called for about 26 
sophomores selected from each school. Clustering tends to increase the variance of survey 
estimates because the observations within a cluster are similar and therefore add less information 
than independently selected observations. The impact of clustering depends mainly on two 
factors: the number of observations within each cluster and the degree of within-cluster 
homogeneity. When cluster sizes vary, the impact of clustering (DEFFc) can be estimated by 

DEFFc = 1 + (b - l) rho, 

where b refers to the average cluster size (the average number of students selected from each 
school) and rho refers to the intraclass correlation coefficient, a measure of the degree of within- 
cluster homogeneity. If the value of rho (which varies from one variable to the next) averaged 
about 0.05 in both studies, then the reduced cluster size in ELS:2002 would almost exactly 
account for the reduction in the design effects relative to HS&B. 

If one must perform a quick analysis of ELS:2002 data without using one of the software 
packages for analysis of complex survey data, the design effects tables in appendix G can be 
used to make approximate adjustments to the standard errors of survey statistics computed using 
the standard software packages that assume simple random sampling designs. One cannot be 
confident regarding the actual design-based standard error without perfonning the analysis using 
one of the software packages specifically designed for analysis of data from complex sample 
surveys. 

Standard errors for a proportion can be estimated from the standard error computed using 
the formula for the standard error of a proportion based on a simple random sample and the 
appropriate DEFT : 

SE = DEFT * (p(l-p)/n) 1/2 . 

Similarly, the standard error of a mean can be estimated from the weighted variance of 
the individual scores and the appropriate mean DEFT : 

SE = DEFT * (Var/n) 1/2 . 

Tables 26 to 3 1 make it clear that the DEFFs and DEFTs vary by subgroup, with the 
largest values appearing among other private schools. It is therefore important to use the mean 
DEFT for the relevant subgroup in calculating approximate standard errors for subgroup 
statistics. 
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Standard error estimates may be needed for subgroups that are not shown in appendix G. 
One rule of thumb may be useful in such situations. The general rule states that design effects 
will generally be smaller for groups that are fonned by subdividing the subgroups listed in the 
tables. (Smaller subgroups will be affected less by clustering than larger subgroups; in terms of 
the equation for DEFFc, b will be reduced.) Estimates for Hispanic males, for example, will 
generally have smaller design effects than the corresponding estimates for all Hispanics or all 
males. For this reason, it will usually be conservative to use the subgroup mean DEFT to 
approximate standard errors for estimates concerning a portion of the subgroup. This rule only 
applies when the variable used to subdivide a subgroup crosscuts schools. Sex is one such 
variable because most schools include students of both sexes. It will not reduce the average 
cluster size to fonn groups that are based on subsets of schools. 

Standard errors may also be needed for other types of estimates than the simple means 
and proportions that are the basis for the results presented in the above tables. A second method 
can be used to estimate approximate standard errors for comparisons between subgroups. If the 
subgroups crosscut schools, then the design effect for the difference between the subgroup means 
will be somewhat smaller than the design effect for the individual means; consequently, the 
variance of the difference estimate will be less than the sum of the variances of the two subgroup 
means from which it is derived: 

Var(b-a) = Var(b) + Var(a) 

where Var(b-a) refers to the variance of the estimated difference between the subgroup means, 
and Var(a) and Var(b) refer to the variances of the two subgroup means. This equation assumes 
that the covariance of the subgroup means is negligible. It follows from this equation that Var(a) 
+ Var(b) can be used in place of Var(b-a) with conservative results. 

A final rule of thumb is that many complex estimators show smaller design effects than 
simple estimators (Kish and Frankel 1974). This principle implies that it will often be 
conservative to use the DEFTs in the above tables in calculating approximate standard errors for 
complex statistics, such as multiple regression coefficients. The procedure for calculating such 
approximate standard errors is the same as with simpler estimates: first, a standard error is 
calculated using the fonnula for data from a simple random sample; then the standard error is 
multiplied by the appropriate DEFT. 

One analytic strategy for accommodating complex survey designs is to use the mean 
design effect to adjust for the effective sample size resulting from the design. For example, one 
could create a weight that is the multiplicative inverse of the design effect and use that weight (in 
conjunction with sampling weights) to deflate the obtained sample size to take into account the 
inefficiencies resulting from a sample design that is a departure from a simple random sample. 
Using this procedure, statistics calculated by a statistical program such as SAS or SPSS will 
reflect the reduction in sample size in the calculation of standard errors and degrees of freedom. 
Such techniques capture the effect of the sample design on sample statistics only approximately. 
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However, although it does not provide a full accounting of the sample design, this procedure 
does provide some adjustment for the sample design and is probably better than conducting 
analysis that assumes the data were collected from a simple random sample. The analyst 
applying this correction procedure should carefully examine the statistical software being used 
and assess whether the program treats weights in such a way as to produce the effect described 
above. 

6.4 Third Follow-up Imputation 

6.4.1 Imputation Variables 

A set of 14 key analytic variables was identified for hem imputation on data obtained 
from the ELS:2002 third follow-up member interview. These 14 variables include indicators of 
whether the respondent ever applied to or attended a postsecondary institution, highest level of 
education attained, and various employment indicators such as whether the respondent has held a 
job for pay since high school, as well as total job earnings. Additional variables were considered 
for this list but were excluded because they were deemed to be of little analytic importance. Note 
that some variables with high item response rates were imputed because the variables were 
considered to be key analytic variables. Table 32 lists the selected variables in the order in which 
they were imputed. Note that some of the imputed variables are inputs to other imputed 
variables. For example, if F3STLOANEVR was imputed as a “No” then the value of 
F3STFOANAMT was set to 0. Sets of logical constraints, specific to each imputed variable, 
were used to ensure that imputed values for any given imputed variable are consistent with 
values for all other imputed variables. 

6.4.2 Imputation Methodology 

Stochastic methods were used to impute the missing values for the EFS:2002 third 
follow-up data. Specifically, a weighted sequential hot-deck (WSHD) statistical imputation 
procedure (Cox 1980; Iannacchione 1982) using the final analysis weight (F3QWT) was applied 
to the missing values for the variables in table 12 in the order in which they are listed. The 
WSHD procedure replaces missing data with valid data from a donor record within an 
imputation class. In general, variables with lower item nonresponse rates were imputed earlier in 
the process. 
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Table 32. ELS:2002 third follow-up imputation variables, by number and weighted proportion 
imputed 


Variable name 

Description 

Number 
of cases 
imputed 

Weighted 

percent 

imputed 

F3EVRJOB 

Ever held a job for pay since high school 

3 

0.03 

F3EVRATT 

Ever attended a postsecondary institution 

4 

0.04 

F3EDSTAT 

Education status 

6 

0.06 

F3STEXP 

Highest level of education expected to complete 

23 

0.18 

F3ATTAINMENT 

Highest level of education attained 

183 

1.40 

F3ERN201 1 

Total job earnings in 201 1 calendar year 

612 

4.63 

F3ONET6AGE30 

Occupation at age 30 

862 

6.69 

F3EMPSTAT 

Employment status 

Whether respondent took out any student/postsecondary education 

1,125 

8.90 

F3STLOANEVR 

loans 

1,032 

7.87 

F3STLOANAMT 

Total amount borrowed in student loans 

1,183 

9.00 

F3SPERN201 1 

2011 employment income: spouse/partner only 

1,184 

9.52 

F3STLOANPAY 

Amount currently paid monthly toward student loan balance 
Number of months between high school completion and bachelor’s 

1,092 

8.29 

F3HS2BA 

completion 

Number of months between postsecondary entry and bachelor’s 

880 

5.57 

F3PS2BA 

completion 

995 

6.36 


SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 


Imputation classes were constructed for each imputed variable. Some imputation classes 
were constructed directly while others were identified using a recursive partitioning function 
created in R®. 35 With recursive partitioning (also known as a nonparametric classification tree or 
classification and regression tree [CART] analysis), the association of a set of items and the 
variable requiring imputation are statistically tested (Breiman et al. 1984). The result is a set of 
imputation classes fonned by the cross-classification of the items that are most predictive of the 
variable in question. The pattern of missing items within the imputation classes is expected to 
occur randomly so that the WSHD procedure can be used. Table 33 lists the initial group of 
potential classification variables used during the imputation procedure. In each step of the 
imputation procedure, previously imputed variables also added to the potential classification 
group to identify associations among the imputation variables themselves. 


35 


See www.r-proiect.org . 
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Table 33. Initial group of potential classification variables for ELS:2002 third follow-up imputation 


Variable Name 

BYREGION 

BYSCTRL 

BYURBAN 

FI RACE 

F1SES1QR 

F1SEX 

F3HSSTAT 

F2EVRJOB 

F2EVRATT 

F2STEXP 


Description 

Geographic region of school 
School control 
School urbanicity 

First follow-up student’s race/ethnicity-composite 

First follow-up quartile coding of SES1 variable (restricted) 

First follow-up sex-composite 

High school completion status (updated version of F2HSSTAT) 

Ever held a job since leaving high school — composite 

Whether has ever attended a postsecondary institution — composite 

Highest level of education respondent expects to complete — composite 


SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS :2002), Base Year (2002) to Third Follow-up, 2012. 


In addition to questionnaire items used to form the imputation classes, sorting variables 
were used within each class to increase the chance of obtaining a close match between donor and 
recipient. If more than one sorting variable was chosen, a serpentine sort was performed where 
the direction of the sort (ascending or descending) changed each time the value of a variable 
changed. The serpentine sort minimized the change in the student characteristics every time one 
of the variables changed its value. Table 34 shows the method of imputation used, variables used 
to construct the classification, as well as sorting variables for each step of the imputation process. 


Table 34. Details of imputation procedures for ELS:2002 third follow-up imputation variables 


Imputation variable 1 

Method of 
Classification 2,3 

Imputation class 
variables 

Sort variables within 
class 

Ever held a job for pay since high school 
(F3EVRJOB) 

Static 

F2EVRJOB 

BYREGION BYSCTRL 
BYURBAN 

Ever attended a postsecondary institution 
(F3EVRATT) 

Static 

F2EVRATT 

BYREGION BYSCTRL 
BYURBAN 

Education status (F3EDSTAT) 

CART 

F3EVRATT F2STEXP 

BYREGION BYSCTRL 
BYURBAN 

Highest level of education expected to 
complete (F3STEXP) 

Static 

F2STEXP 

BYREGION BYSCTRL 
BYURBAN 

Highest level of education attained 
(F3ATTAINMENT) 

CART 

F3EDSTAT 

F3EVRATT 

F3STEXP 

BYREGION BYSCTRL 
BYURBAN 

Total job earnings in 201 1 calendar year 
(F3ERN201 1 ) 

CART 

F3ATTAINMENT 

BYREGION BYSCTRL 
BYURBAN 

Occupation at age 30 (F3ONET6AGE30) 

CART 

F3STEXP 

F3ATTAINMENT 

BYREGION BYSCTRL 
BYURBAN 

Employment status (F3EMPSTAT) 

CART 

F3ERN201 1 

BYREGION BYSCTRL 
BYURBAN 

Whether respondent took out any 
student/postsecondary loans 
(F3STLOANEVR ) 

CART 

F3EVRATT 

BYREGION BYSCTRL 
BYURBAN 

Total amount borrowed in student loans 
(F3STLOANAMT) 

CART 

F3STLOANEVR 

F3STEXP 

F3ATTAINMENT 

F3EMPSTAT 

BYREGION BYSCTRL 
BYURBAN 


See notes at end of table. 
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Table 34. Details of imputation procedures for ELS:2002 third follow-up imputation variables — 
Continued 

Imputation variable 1 

Method of 
Classification 23 

Imputation class 
variables 

Sort variables within 
class 

2011 employment income: spouse/partner 
only (F3SPERN2011) 

CART 

FI SEX FI RACE 

BYREGION BYSCTRL 
BYURBAN 

Amount currently paid monthly toward 
student loan balance 
(F3STLOANPAY) 

CART 

F3STLOANAMT 
F3EDSTAT 
F3ERN201 1 

BYREGION BYSCTRL 
BYURBAN 

Number of months between high school 
completion and bachelor’s completion 
(F3HS2BA) 

Static 

F3ATTAINMENT 

BYREGION BYSCTRL 
BYURBAN 

Number of months between 

postsecondary entry and bachelor’s 
completion (F3PS2BA) 

Static 

F3ATTAINMENT 

BYREGION BYSCTRL 
BYURBAN 


1 The variables are listed in the order in which the missing values were imputed. 

2 CART signifies that WSHD imputation was performed with the assistance of a nonparametric classification and regression tree 
(CART) procedure for classification. 

3 Static signifies that WSHD imputation was performed with a pre-determ ined group of classification variables. 

NOTE: WSHD = weighted sequential hot deck. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

6.4.3 Imputation Results 

After the variables were imputed, a set of quality control checks was implemented to 
ensure the highest quality. For example, the number of times a donor was used in the WSHD was 
tracked. If it was determined that a single donor was used too often (e.g., the donor value was 
used in place of three or more missing values) in a single imputation pass, then the classification 
process was altered to ensure a realistic level of variation in the imputed responses. The weighted 
percent distribution of the values before and after the imputation procedure was also compared, 
both within and across the imputation classes, to identify large areas of change. Differences 
greater than 5 percent were flagged and examined to detennine whether changes should be made 
to the imputation sort or class variables. 

Multivariate consistency checks ensured that relationships between the imputation 
variables were maintained and that any special instructions for the imputation were implemented 
properly. For these checks, it was important to ensure that the imputation process did not create 
any new relationships that did not already exist in the item respondent data. For variables that 
had multiple consistency checks to track, it was necessary to evaluate all checks after the 
imputations were complete and reimpute a small number of cases that were inconsistent. These 
cases were originally imputed inconsistently because it was not possible to include all necessary 
variables in the classification for the imputation because it resulted in imputation classes that 
contained no donor records. Once the cases for reimputation were identified, they were 
reimputed (stochastically) using the same classes as their original pass but the donor pool was 
restricted to maintain consistency. 
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6.5 Data Security; Third Follow-up Disclosure Risk Analysis and 
Protections 

Data security was a pervasive concern for the third follow-up. Extensive confidentiality 
and data security procedures were employed for ELS:2002 data collection and data processing 
activities; some of those procedures are summarized briefly here. All project staff members 
signed confidentiality agreements and affidavits of nondisclosure and are prohibited by law from 
using the obtained information for any purposes other than this research study. The third follow- 
up interview data were collected via the Web on a server protected with a Secure Sockets Layer 
encryption policy, which forces all data transferred to or from the website to be encrypted and 
transmitted only via secure (HTTPS) connection to confonning web browsers. Sample members 
received an e-mail and a lead letter that described the purpose of the study, and contained the 
URL to the ELS:2002 secure website and a user ID number and strong randomly generated 
credential which allowed them access to the web-based interview. The only mechanism of access 
to the self-administered web-based interview was through this ID number and credential. Sample 
members could only access their individual case using their ID number and credential; they 
could not access data or information about anyone else. The ID numbers provided to sample 
members were completely different from the data IDs included on the restricted- and public-use 
data files. Data were prepared in accordance with NCES-approved disclosure avoidance plans. 
The data disclosure guidelines are designed to minimize the possibility of a data user being able 
to identify individuals on the file by matching outliers or other unique data to external data 
sources. 

Because of the paramount importance of protecting the confidentiality of NCES data that 
contain infonnation about specific individuals, ELS:2002 third follow-up data files were subject 
to various procedures to minimize disclosure risk. The ELS:2002 third follow-up data products 
and the disclosure treatment methods employed to produce them are described in the following 
sections. 

6.5.1 Third Follow-up Data Products 

The set of data products produced for the ELS:2002 third follow-up are similar to the set 
of data products produced in the base year and first and second follow-ups in that a public-use 
and a restricted-use data file were created. 

The disclosure treatment developed for the ELS:2002 third follow-up is composed of 
several steps: 
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• Review the collected data and identify items that may increase risk of disclosure. 

• Apply disclosure treatment 36 to these risky items to lower risk of disclosure. 

• Produce a restricted-use data file that incorporates the disclosure-treated data. 

• Produce a public-use data file that incorporates the disclosure-treated data. 

The disclosure treatment methods used to produce the ELS:2002 third follow-up data 
files include variable recoding, variable suppression, and swapping. These methods are described 
below. 

6.5.2 Recoding, Suppression, and Swapping 

Some of the data used during data collection activities were deemed to be too identifying 
and were not included in either the restricted-use or the public-use data file. Some restricted-use 
data were deemed to be too identifying for inclusion in the public-use file and these data were 
not included in the public-use file. 

For items in the restricted-use file, recoding was used to produce more analytically useful 
variables. Some restricted-use variables were recoded for inclusion in the public-use data file. 

The recoding for the public-use file was necessary because some items had values that occurred 
with extremely low frequencies in the restricted-use data file and the items were therefore 
recoded to ensure that all values of all items occurred with a reasonable frequency in the public- 
use file. 

Swapping was applied to ELS:2002 data items detennined to potentially increase risk of 
disclosure. Respondents were randomly selected for swapping to achieve a specific, but 
undisclosed, swapping rate. In data swapping, the values of the variables being swapped are 
exchanged between carefully selected pairs of records: a target record and a donor record. By so 
doing, even if a tentative identification of an individual is made, because every case in the file 
has some undisclosed probability of having been swapped, uncertainty remains about the 
accuracy and interpretation of the match. The third follow-up swapping was done independently 
of the swapping conducted in the base year, the first follow-up, and the second follow-up. 

Because perturbation (swapping) of the ELS:2002 data may change the relationships 
between data items, an extensive data quality check was carried out to limit the impact of 
swapping on relationships. Before-and-after weighted distributions and correlations for swapped 
variables show that, after applying the disclosure limitation techniques, the analytic utility of the 
data files was not compromised. 


36 The NCES Statistical Standards (Seastrom 2003) ( http://nccs. cd.gov/statDrog/2002/std4 2. asp ), specifically NCES Standard 
4-2, provide information both about the legislative background and legal requirements of maintaining confidentiality, and 
definitions of key terms (perturbation, coarsening, disclosure risk analysis, data swapping, and so forth). 
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6.6 Third Follow-up Unit and Item Nonresponse Bias Analysis 

6.6.1 Unit Nonresponse Bias Analysis 

Unit nonresponse causes bias in survey estimates when the outcomes of respondents and 
nonrespondents are different. For the ELS:2002 third follow-up, student response is defined as 
the sample member completing at least a specified portion of the questionnaire. The weighted 
response rate was 77.8 percent overall. The domains selected for the unit nonresponse bias 
analysis were derived from the subpopulations listed in section 6.2. Examples of subpopulations 
used in the nonresponse bias analysis are given below: 

• Spring 2002 Black lOth-grade students 

• Spring 2002 Hispanic lOth-grade students 

• Spring 2002 Asian lOth-grade students 

• Spring 2002 White/other lOth-grade students 

• Spring 2002 public school lOth-grade students 

• Spring 2002 Catholic school lOth-grade students 

• Spring 2002 other private school lOth-grade students 

• Spring 2002 lOth-grade students who graduated by August 31, 2004 

• Spring 2004 Black 12th-grade students 

• Spring 2004 Hispanic 12th-grade students 

• Spring 2004 Asian 12th-grade students 

• Spring 2004 White/other 12th-grade students 

• Spring 2004 public school 12th-grade students 

• Spring 2004 Catholic school 12th-grade students 

• Spring 2004 other private school 12th-grade students 

• Spring 2004 12th-grade students who graduated by August 31, 2004 

Because the overall response rate was below 85 percent, nonresponse bias analyses were 
conducted as required under NCES standards. All third follow-up weights described in section 
6.2 were used in the nonresponse bias analyses. 

The nonresponse bias was estimated for variables known for both respondents and 
nonrespondents. Because the sample for the third follow-up consists of respondents from the 
base year, the first follow-up, and the second follow-up, sample member data used in the 
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nonresponse bias analysis were available, although some of the available data may have been 

•57 

imputed. The sample member data that were used include: 

• Student race/ethnicity 

• Student sex 

• Student’s native language 

• Family composition 

• Parents’ highest level of education 

• Mother/female guardian’s occupation 

• Father/male guardian’s occupation 

• Total family income from all sources 

• Socioeconomic status (SES) 

The sample member’s Spring 2004 enrollment status was also used and defined as 
follows: 

• in school, in grade (in grade 12); 

• in school, out of grade (in grade 10 or 11, ungraded, or graduated early); and 

• out of school (dropout or homeschooled). 

The sample member cohort flags were also used: 

• GIOCOHRT — indicates a member of the sophomore cohort (i.e., Spring 2002 10th- 
grader) 

• G12COHRT — indicates a member of the senior cohort (i.e., Spring 2004 12th-grader) 

There were also extensive data available for schools from the base-year school 
administrator questionnaire, so these data were used to help reduce potential nonresponse bias. 
Students were linked to the base-year school from which they were sampled. The school 
sampling frame constructed from the Common Core of Data and Private School Universe Survey 
also contains data for all base-year schools. School data used included the following: 

• school sector; 

• urbanicity; 

• region; 

• sophomore enrollment; 


37 

For example, some base-year nonrespondents were sampled for inclusion in the first follow-up study. Some of these base-year 
nonrespondents responded in the first follow-up and some base-year data were collected on these individuals. If these individuals 
did not provide these base-year data in the first follow-up questionnaires then some of their base-year data were imputed. 
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• total enrollment; 

• number of minutes per class; 

• number of class periods; 

• number of school days; 

• number of students receiving free or reduced-price lunch; 

• number of full-time teachers; 

• percentage of full-time teachers certified; 

• number of part-time teachers; 

• number of grades taught at the school; 

• school level; 

• coeducational status; 

• percentage of students with an individualized education program; 

• percentage of students with limited English proficiency; 

• percentage of Hispanic or Latino sophomores; 

• percentage of Asian sophomores; 

• percentage of Black or African American sophomores; and 

• percentage of all other race sophomores (includes White). 

The procedures used for the nonresponse bias analysis were similar to those used in the 
base year, first follow-up, and second follow-up. First, sample member data known for both 
respondents and nonrespondents were identified. Second, because the set of data known for both 
respondents and nonrespondents was limited, all of these data were incorporated into 
nonresponse models used for the third follow-up. The nonresponse adjustments described in 
section 6.2 were designed to significantly reduce or eliminate nonresponse bias for variables 
included in the models. Variables not known for most respondents and nonrespondents could not 
be included in the nonresponse adjustments, and therefore nonresponse bias could not explicitly 
be reduced for these variables. However, many of the variables in the nonresponse models are 
correlated with many of the other variables. 

Third, after the sample member weights were computed, remaining bias for data known 
for most respondents and nonrespondents was estimated and statistically tested to check if there 
was any remaining significant nonresponse bias. Fourth, the remaining bias after student weight 
adjustments was divided by the standard error, that is, bias/standard error. 

The bias in an estimated mean based on respondents, k R , is the difference between this 
mean and the target parameter, 71 (i.e., the mean that would be estimated if a complete census of 
the target population was conducted). This bias can be expressed as follows: 
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B {y R ) = y r -^ 

The estimated mean based on nonrespondents, - v NR , can be computed if data for the 
particular variable for most of the nonrespondents are available. The estimation of n is as 
follows: 

^=^-v)y R +iym 

where r) is the weighted unit nonresponse rate. For the variables that are from the frame rather 
than from the sample, jt can be estimated without sampling error. Therefore, the bias can be 
estimated as follows: 

B (y R ) = y R -x 

or, equivalently, 

B{y R ) = v{yR-y N R) 

This formula shows that the estimate of the nonresponse bias is the difference between 
the mean for respondents and the mean for nonrespondents multiplied by the weighted 
nonresponse rate. The variance of the bias was computed using Taylor series estimation in RTFs 
software package SUDAAN. 

In addition to calculating and assessing bias, relative bias estimates were also calculated 
and are provided so that bias estimates may be compared across the variables for which bias 
estimates were generated. The estimated relative bias fonnula, as per Biemer and Lyberg (2003), 
is: 

Tables H-la through H-6a in appendix H show the nonresponse bias before and after 
weight adjustments for selected variables for sample members where F3QWT is used in table 
H-la, F3QTSCWT is used in table H-2a, F3BYPNLWT is used in table H-3a, F3BYTSCWT is 
used in table H-4a, F3F1PNLWT is used in table H-5a, and F3F1TSCWT is used in table H-6a. 
The tables show estimated bias before nonresponse adjustment for the variables available for 
most responding and nonresponding students. Statistical tests ( t tests) were used to test each level 
of the variables for significance of the bias at the 0.05/(c-l) significance level, where c is the 
number of categories (levels) within the primary variable. Tables 35 to 40 provide a summary of 
the before- and after-adjustment significant bias derived from tables H-la to H-6a, respectively, 
in appendix H. 
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Table 35. Summary of student bias analysis for F3QWT 


Nonresponse bias statistics 1 Overall 

Before weight adjustments — study member 

Mean percent relative bias across characteristics 2.67 

Median percent relative bias across characteristics 1.52 

Percent of characteristics with significant bias 40.43 

After nonresponse weight adjustments — study member 

Mean percent relative bias across characteristics 0.06 

Median percent relative bias across characteristics 0.04 

Percent of characteristics with significant bias # 

# Rounds to zero. 


Relative bias and significance calculated on respondents versus full sample. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

Table 36. Summary of student bias analysis for F3QTSCWT 


Nonresponse bias statistics 1 Overall 

Before weight adjustments — study member 

Mean percent relative bias across characteristics 2.01 

Median percent relative bias across characteristics 1.71 

Percent of characteristics with significant bias 22.14 

After nonresponse weight adjustments — study member 

Mean percent relative bias across characteristics 0.04 

Median percent relative bias across characteristics 0.03 

Percent of characteristics with significant bias # 

# Rounds to zero. 


Relative bias and significance calculated on respondents versus full sample. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

Table 37. Summary of student bias analysis for F3BYPNLWT 


Nonresponse bias statistics 1 Overall 

Before weight adjustments — study member 

Mean percent relative bias across characteristics 2.65 

Median percent relative bias across characteristics 1.53 

Percent of characteristics with significant bias 42.45 

After nonresponse weight adjustments — study member 

Mean percent relative bias across characteristics 0.07 

Median percent relative bias across characteristics 0.04 

Percent of characteristics with significant bias # 

# Rounds to zero. 


Relative bias and significance calculated on respondents versus full sample. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 


112 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 










Chapter 6. Weighting , Imputation , Bias Assessment , and Design Effects 


Table 38. Summary of student bias analysis for F3BYTSCWT 


Nonresponse bias statistics 1 Overall 

Before weight adjustments — study member 

Mean percent relative bias across characteristics 2.04 

Median percent relative bias across characteristics 1.71 

Percent of characteristics with significant bias 22.31 

After nonresponse weight adjustments — study member 

Mean percent relative bias across characteristics 0.04 

Median percent relative bias across characteristics 0.03 

Percent of characteristics with significant bias # 

# Rounds to zero. 


Relative bias and significance calculated on respondents versus full sample. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

Table 39. Summary of student bias analysis for F3F1 PNLWT 


Nonresponse bias statistics 1 Overall 

Before weight adjustments — study member 

Mean percent relative bias across characteristics 3.60 

Median percent relative bias across characteristics 2.16 

Percent of characteristics with significant bias 45.07 

After nonresponse weight adjustments — study member 

Mean percent relative bias across characteristics 0.10 

Median percent relative bias across characteristics 0.04 

Percent of characteristics with significant bias # 

# Rounds to zero. 


Relative bias and significance calculated on respondents versus full sample. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

Table 40. Summary of student bias analysis for F3F1TSCWT 


Nonresponse bias statistics 1 Overall 

Before weight adjustments — study member 

Mean percent relative bias across characteristics 2.18 

Median percent relative bias across characteristics 1 .90 

Percent of characteristics with significant bias 29.69 

After nonresponse weight adjustments — study member 

Mean percent relative bias across characteristics 0.04 

Median percent relative bias across characteristics 0.03 

Percent of characteristics with significant bias # 

# Rounds to zero. 


Relative bias and significance calculated on respondents versus full sample. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 
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Tables H-lb through H-6b in appendix H show the estimated bias after weight 
adjustments (using F3QWT table H-lb, F3QTSCWT in table H-2b, F3BYPNLWT in table H-3b, 
F3BYTSCWT in table H-4b, F3F1PNLWT in table H-5b, and F3F1TSCWT in table H-6b) for 
the variables available for most responding and nonresponding students. The bias after weight 
adjustments was computed as the difference between the estimate using nonresponse-adjusted 
and calibrated (final) weights and the estimate using the design (base) weights prior to 
nonresponse and calibration adjustment. This latter estimate is an estimate of n, the population 
parameter of interest, because it is the estimate of the target population using the design weights. 
Similar to the testing of before-adjustment bias, t tests were performed to test the significance of 
the bias for each level of the variables. In tables H-lb through H-6b, the estimated bias usually 
decreased after weight adjustments. Therefore, the number of significantly biased levels of 
variables decreased from 58 before adjustment to 1 after adjustment in table H-l, from 30 before 
adjustment to 0 after adjustment in table H-2, from 60 before adjustment to 0 after adjustment in 
table H-3, from 3 1 before adjustment to 0 after adjustment in table H-4, from 65 before 
adjustment to 1 after adjustment in table H-5, and from 39 before adjustment to 0 after 
adjustment in table H-6. 

6.6.2 Item Nonresponse Bias Analysis 

Because the overall weighted unit response rate (77.8 percent) was less than 85 percent, 
an item nonresponse bias analysis was carried out as required under NCES statistical standards. 
The first step in the item-level nonresponse bias analysis was to calculate the weighted 38 item- 
level response rate for every questionnaire item included in the ELS:2002 third follow-up. 
Twenty-nine items were found to have response rates lower than 85 percent. Table 41 shows the 
item-level response rates for the 29 items. 


38 Weighted item-level response rates used to identify items for which nonresponse bias analysis were required were calculated 
using the third follow-up current-round weight, F3QWT. 
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Table 41. Student-level questionnaire items with a weighted item response rate below 85 percent 
using F3QWT 


Variable name Variable label 


F3MOBILITYF1F2 

F3A19 

F3STLOANAMT 
F3ALLINC201 1 
F3PARAGE 
F3A07 
F3A09B 

F3A09D 

F3D22 

F3SPERN201 1 

F3A10 

F3A08 

F3A09A 

F3A09E 

F3A09F 

F3OCC30F1 VF3 
F3A09C 

F3DEBTALL201 1 

F3OCC302 

F3OCC306 

F3D42 

F3OCC30BYVF3 

F3ONET2AGE30 

F3ONET6AGE30 

F3A21 

F3STLOANPAY 

F3D16 

F3SEIAGE30 

F3B22 


RESIDENTIAL MOBILITY: F1-V.-F2 
TOTAL AMOUNT BORROWED IN STUDENT LOANS 
TOTAL AMOUNT BORROWED IN STUDENT LOANS 
2011 INCOME FROM ALL SOURCES 
AGE AT WHICH RESPONDENT FIRST HAD A CHILD 
PROGRAM THROUGH WHICH GED/EQUIVALENCY WAS EARNED 
COMPLETED GED TO TRAIN FOR A NEW JOB/CAREER 
COMPLETED GED TO MEET REQUIREMENTS FOR ADDITIONAL 
STUDY 

SPOUSE/PARTNERS 2011 EMPLOYMENT EARNINGS 
2011 EMPLOYMENT INCOME: SPOUSE/PARTNER ONLY 
WHETHER CURRENTLY WORKING TOWARDS A GED/EQUIVALENT); 
STATE IN WHICH GED/EQUIVALENCY WAS EARNED 
COMPLETED GED TO IMPROVE/ADVANCE/KEEP UP TO DATE ON 
CURRENT JOB 

COMPLETED GED B/C REQUIRED OR ENCOURAGED BY 
EMPLOYER 

COMPLETED GED FOR PERSONAL OR FAMILY OR SOCIAL 
REASONS 

CHANGE/STABILITY IN OCCUPATIONAL EXPECTATIONS: F1-V.-F3 
COMPLETED GED TO IMPROVE BASIC READING/WRITING/MATH 
SKILLS 

DEBT/INCOME RATIO: CURRENT DEBT TO 201 1 INCOME FROM ALL 
SOURCES 

WHAT JOB FOR PAY OR OCCUPATION DO YOU PLAN TO HAVE 
WHEN YOU ARE AGE 307—2 DIGIT CODE 
WHAT JOB FOR PAY OR OCCUPATION DO YOU PLAN TO HAVE 
WHEN YOU ARE AGE 307—6 DIGIT CODE 
FREQUENCY OF VOLUNTEER SERVICE 

CHANGE/STABILITY IN OCCUPATIONAL EXPECTATIONS: BY-V.-F3 
2-DIGIT ONET CODE FOR EXPECTED AGE-30 OCCUPATION 
6-DIGIT ONET CODE FOR EXPECTED AGE-30 OCCUPATION 
AMOUNT CURRENTLY PAID MONTHLY TOWARD STUDENT LOAN 
BALANCE 

AMOUNT CURRENTLY PAID MONTHLY TOWARD STUDENT LOAN 
BALANCE 

WHETHER RESPONDENT LIVES IN PARENT(S) HOME OR PARENTS 
LIVE IN RESPONDENTS HOME 
SEI-BASED CODE FOR EXPECTED AGE-30 OCCUPATION 
HOURS CURRENTLY WORKING PER WEEK ACROSS ALL JOBS 


Weighted item 
response rate 
8A8 
83.8 

83.8 

82.8 
82.6 
82.2 
82.2 

82.2 

82.0 

82.0 

82.0 

81.8 

81.8 

81.8 

81.8 

81.5 

81.4 
81.2 

78.4 

78.3 

77.7 

77.6 

77.4 

77.3 

74.7 

74.7 

66.0 

65.4 
52.2 


SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 


Tables 1-1 through 1-29 in appendix I compare item respondents and nonrespondents on 
these 29 items by using four characteristics known for all item respondents and item 
nonrespondents. Weighted distributions of the values of these four characteristics were generated 
using both respondents and nonrespondents, using respondents only, and using nonrespondents 
only and these distributions are presented in tables 1-1 through 1-29. 
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Twenty-eight of the variables had a least one statistically significant bias. The only 
variable that did not have any statistically significant bias was F3A08 (STATE IN WHICH 
GED/EQUIVALENCY WAS EARNED). There were 201 significant bias tests out of the 406 
tests that were conducted. 

6.7 Assessment of Responsive Design for ELS:2002 Third Follow-up 

6.7.1 Responsive Design Setting 

As discussed in section 4.3, the goal of responsive design was to identify and target, via 
new protocols or interventions, the nonresponding cases that are different from the respondent 
set at any one point. Although numerous approaches are available to identify target cases (e.g., 
critical subgroups, propensity to respond), the ELS:2002 third follow-up used a Mahalanobis 
distance function to identify nonrespondent cases most unlike the existing respondent set. A 
large number of survey variables, paradata, and sampling frame variables were incorporated into 
the distance function calculation providing an opportunity to target the cases most unlike 
respondents and therefore, if completed, most likely to reduce nonresponse bias. In this section, 
the bias results are discussed. 

6.7.2 Responsive Design Results 

Responsive design was integrated into the ELS:2002 third follow-up data collection and 
the responsive design is discussed in section 4.3. The analysis of the responsive design 
implementation is presented in this section. Cases that were selected at each of the phases had 
different levels of response. Those selected in phase 1 had an unweighted response rate of 72.9 
percent. The cases selected in phase 2 had an unweighted response rate of 68.5 percent. The 
phase 3 cases had an unweighted response rate of 71.5 percent. 

The primary question is whether data collection outcomes (i.e., bias in key survey 
estimates) were improved by identifying and prioritizing cases using a Mahalanobis distance 
score. To answer this question, 12 key frame and survey variables (the same variables used in the 
distance function calculation) were examined for evidence of bias at multiple points in data 
collection. The 12 variables included a total of 57 levels and therefore 57 bias estimates are 
presented. There were 4,683 cases that responded just before phase 1 case selection. 
Cumulatively, 7,805 cases had responded before phase 2 case selection. Although there were 
3,122 additional respondents between phase 1 case selection and phase 2 case selection, only 322 
were phase 1 selected cases. 

Table J-l in appendix J lists the variables used to construct bias estimates and Table J-2 
shows the categorization of those variables used in the bias assessment. Table J-3 shows the bias 
estimates for all respondents, untreated respondents, and untreated plus phase 1 respondents. At 
the conclusion of data collection, 35 of the 57 pre weight adjustment estimates are biased or have 
values statistically different from zero. If all respondents who received no treatment or were not 
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targeted are considered, 41 of the 57 estimates are significantly different from zero, or biased. 
Finally, if the untreated plus phase 1 respondents are considered, 33 of 57 estimates are 
significantly biased. The number of biased estimates goes down slightly if respondents selected 
in phase 1 are included with the untreated respondents. 
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Chapter 7. 
Data File Contents 


This chapter describes the Education Longitudinal Study of 2002 (ELS:2002) base-year 
to third follow-up longitudinal data file contents. It addresses the general structure of the data 
files included as well as the nature of the data access and analysis tools available to users which 
currently include the electronic codebook (ECB) system and the Education Data Analysis Tool 
(ED AT). The chapter also provides details associated with several key data file characteristics 
such as data reserve codes, variable naming conventions, and the nature of composite variables. 
In addition, the chapter provides information regarding how best to analyze ELS:2002 data, 
including guidance on cohort flag and weight selection. 

7.1 Data File Structure 

This section describes the general structure of the ELS:2002 data files and the individual 
file components. 

7.1.1 Base-year to Third Follow-up Longitudinal Data File Structure 

The base-year to third follow-up longitudinal data file combines the full complement of 
base-year to third follow-up data (including high school transcript data). A summary of the 
comprehensive set of files that comprise the base-year to third follow-up data file is provided at 
the end of this sub-section in table 42. 

Because the school and student files include data from multiple sources, they are 
considered to be “megafiles.” The pair includes one megafile at the student level (with other data 
sources supplying contextual data for analysis of the student) and one at the high school level. 

The student-level megafile encompasses the following: base-year student data (student, 
parent, and teacher questionnaires and student tests); base-year school-level data merged to the 
appropriate student case (administrator, library, facilities); first follow-up student data (student, 
transfer, dropout, early graduate, and homeschool questionnaires, student tests and transcripts); 
first follow-up school administrator data merged to the appropriate student case; first follow-up 
high school transcript composites 39 merged to the appropriate student case; second follow-up 
questionnaire and composite data; and third follow-up questionnaire and composite data. 
Additional details regarding the student megafile are provided in section 7.1.2. 

The megafile at the school level encompasses base-year data (collected via the facilities 
checklist, school administrator questionnaire, and library media center questionnaire), first 


39 

The content and organization of the high school transcript and course offerings data (course-level file, student-level file, 
school-level file, and course offerings file) are further described in Bozick et al. (2006). 
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follow-up school administrator questionnaire data, high school transcript data (including course 
offerings composite data), and data drawn from the Common Core of Data and the Private 
School Universe Survey. Analysts should be aware that the base-year school data may be used as 
a standalone, nationally representative sample of 2001-02 schools with 10th grades, but that the 
school data for the 2003-04 school year are not precisely generalizable to the nation’s 2003-04 
high schools with 12th grades. Additional details regarding the school megafile are provided in 
section 7.1.3. 

In addition to the student and high school-level megafiles, the data file includes a pair of 
postsecondary student-institution files. The first, referred to as the second follow-up 
postsecondary student-institution file, links students to postsecondary institutions applied to or 
attended as of the second follow-up interview. The second, referred to as the third follow-up 
postsecondary student-institution attendance file, links students to postsecondary institutions for 
which attendance was reported in either the second or third follow-up. It should be noted that the 
third follow-up student-institution file only includes records associated with third follow-up 
respondents. Additional details regarding the postsecondary student-institution files are provided 
in sections 7.1.4 and 7.1.5. 

The base-year to third follow-up longitudinal data file also includes a collection of extant 
data files containing data imported from external administrative records. Specifics regarding 
which extant sources were drawn upon are provided in section 7.1.6. 


Table 42. Summary of files included within the base-year to third follow-up data file 


File name 

Accessibility 

Level 

Record 

count 

Longitudinal composition 

Student file 
(BYF3STU) 

Public and 
restricted 
use 

Student level 

16,197 

BY, FI, F2 and F3 student interview, 
BY parent interview, BY teacher 
interview, school-level (BY and FI) 
administrator and staff interviews, 
school-level composites), HS 
transcript composites (FI), 
postsecondary educational 
transcript composites (F3), extant 
sources 

School file 

(BYF1TSCH) 

Public and 
restricted 
use 

School level 

1,954 

BY and FI administrator interview, BY 
facilities checklist, HS transcript 
(FI), extant sources 

HS student transcript 
file (HSTRNSTU) 

Restricted use 

Student-course 

level 

638,967 

HS student transcripts (FI ) 

HS school transcript 
file 

(HSTRNSCH) 

Restricted use 

School-course 

level 

117,151 

HS course catalog (FI) 

F2 student-institution 
file (F2INST) 

Public and 
restricted 
use 

Student- 

institution 

level 

33,495 

F2 interview, IPEDS 

F3 student-institution 
file (F3INST) 

Public and 
restricted 
use 

Student- 

institution 

level 

20,951 

F2 and F3 interview, IPEDS 


See notes at end of table. 
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Table 42. Summary of files included within the base-year to third follow-up data file — Continued 

File name 

Accessibility 

Level 

Record 

count 

Longitudinal composition 

CPS 2004-05 file 
(CPS0405) 

Restricted use 

Student level 

6,484 

CPS, academic year 2004-05 

CPS 2005-06 file 
(CPS0506) 

Restricted use 

Student level 

5,261 

CPS, academic year 2005-06 

CPS 2006-07 file 
(CPS0607) 

Restricted use 

Student level 

4,343 

CPS, academic year 2006-07 

CPS 2009-10 file 
(CPS0910) 

Restricted use 

Student level 

3,296 

CPS, academic year 2009-10 

CPS 2010-11 file 
(CPS1011) 

Restricted use 

Student level 

3,143 

CPS, academic year 2010-1 1 

CPS 201 1-12 file 
(CPS1112) 

Restricted use 

Student level 

2,408 

CPS, academic year 201 1-12 

CPS 2012-13 file 
(CPS1213) 

Restricted use 

Student level 

2,721 

CPS, academic year 201 2-1 3 

CPS 2013-14 file 
(CPS1314) 

Restricted use 

Student level 

1,074 

CPS, academic year 2013-14 

NSLDS Pell file 1 
(NSLDSPL) 

Restricted use 

Student level 

18,277 

NSLDS 

NSLDS Loan file 2 
(NSLDSLN) 

Restricted use 

Student level 

52,665 

NSLDS 

GED file GED) 

Restricted use 

Student level 

496 

GED 

Student BRR weight 
file (BRRSTU) 

Public and 
restricted 
use 

Student level 

16,197 

BY, FI, F2 and F3 replicate weights 

School BRR weight 
file (BRRSCH) 

Public and 
restricted 
use 

School level 

751 

BY and FI replicate weights 


1 The NSLDS Pell file contains all Pell records available as of the third follow-up. Additional notes related to the cumulative nature of 
this file are provided in section 7.1.6. 

2 The NSLDS loan file contains all loan records available as of the third follow-up. Additional notes related to the cumulative nature 
of this file are provided in section 7.1.6. 

NOTE: BRR = balanced repeated replicate. BY = base year. CPS = Central Processing System. FI = first follow-up. F2 = second 
follow-up. F3 = third follow-up. GED = General Educational Development credential. HS = high school. IPEDS = Integrated 
Postsecondary Education Data System. NSLDS = National Student Loan Data System. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

7.1.2 Student Megafile 

The student file contains all prior-round data , 40 retaining the basic structure as in the base- 
year to second follow-up longitudinal data file. Variables introduced as of the third follow-up 
were typically added to new sections and then inserted into a logical grouping of sections (i.e., 
composites, sample member response data, school replicated data). The section titled “ID and 
Universe Variables” is an exception in spanning rounds of data collection. 


40 Although all data elements have been retained, not all base-year, first follow-up, and second follow-up data have been carried 
over. Specifically, in two rare instances data have been expunged: past data for deceased sample members, and data for sample 
members who withdrew their participation with instructions that past data be dropped. 
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Sections of the student file (BYF3STU) are as follows: 

• ID and Universe Variables; 

• Base-year (BY) Weights and Composites; 

• First follow-up (FI) Weights and Composites; 

• High School Transcript Composites; 

• Second follow-up (F2) Weights and Composites; 

• F2 Extant Data Source Composites; 

• Third follow-up (F3) Weights and Composites; 

• F3 Extant Data Source Composites; 

• BY Student Questionnaire; 

• FI Student Questionnaire; 

• FI Dropout Questionnaire; 

• FI Transfer Questionnaire; 

• FI Early Graduate Questionnaire; 

• FI New Participant Supplement; 

• F2 Survey; 

• F3 Survey; 

• BY Parent Questionnaire; 

• BY Teacher Questionnaire (English); 

• BY Teacher Questionnaire (Math); 

• BY School Composites; 

• FI School Composites; 

• BY Administrator Questionnaire; 

• FI Administrator Questionnaire; 

• BY Library Questionnaire; and 

• BY Facilities Checklist. 

7.1.3 High School Megafile 

The school file reflects data for the base-year survey, first follow-up survey, and first 
follow-up high school transcript (course offerings) data composites. The first follow-up was the 
final round for collection of school-level data directly from high schools. Common Core of Data 
and Private School Universe Survey data were added to the restricted-use ECB as a convenience 
to the ECB user. The School ID (Sch ID) is the unique key identifier on the file and is 
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constructed such that student data file records can be merged with the high school data. It should 
be noted that the student data file includes two high school linkage IDs; one associated with the 
student’s base-year school (Sch ID) and the other associated with the school the student was 
attending (if still enrolled) as of the first follow-up (FISchID). Analysts can merge the student 
and school files on either school ID, depending on the time frame of interest. 

7.1.4 Second Follow-up Postsecondary Student-Institution File 

The postsecondary student-institution file is a student-level file that links students to 
postsecondary institutions applied to or attended as of the second follow-up interview. The key 
on the file is StuID, order number, and Integrated Postsecondary Education Data System ID. 
Data for the institutions were obtained in the second follow-up interview, and collected by 
looping over each institution for a series of questions about application and attendance, among 
others. The looped iterations were nonnalized (one record for each unique postsecondary 
student-institution pair) and placed into the student-institution file structure. The order number 
enables researchers to associate information for a given institution from the student-level file 
with infonnation about the given institution. An order number helps researchers detennine a 
uniquely identifiable key and to allow users to easily link institution-based items from the 
student file to the institution file. 

If the respondent reported attending one postsecondary institution, this institution is listed 
first for that student. If the respondent indicated attending more than one postsecondary 
institution, the one the respondent attended first would be listed first and so on. Institutions that 
respondents applied to but did not attend follow in the order they were named in the interview. 

7.1.5 Third Follow-up Postsecondary Student-Institution Attendance File 

The postsecondary student-institution attendance file is a student-level file that li nk s 
students to postsecondary institutions for which attendance was reported as of the third follow-up 
interview. The key on the file is Stu ID, order number, and Integrated Postsecondary Education 
Data System ID. Data for institutions reported via the third follow-up interview were collected 
by looping over each institution for a series of questions about attendance as well as questions 
regarding the nature of any credentials earned. The file contains one record for each unique 
postsecondary student-institution pair. 

The ordering of the third follow-up student-institution file is consistent with that of the 
second follow-up student-institution file. If a given respondent reported attending a single 
postsecondary institution, this institution is the only one listed for that student. If the respondent 
indicated attending more than one postsecondary institution, the one the respondent attended first 
would be listed first and so on. 

Data associated with credentials earned for any given institution are presented in two 
“blocks,” meaning two subsets of variables that follow a specific naming convention. There are 
four basic data patterns regarding data stored in credential block #1 (variables following the 
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naming convention F3ICREDxxxx_l) and credential block #2 (variables following the naming 
convention F3ICREDxxxx_2): 

• The respondent has not earned a credential from the associated postsecondary institution 
(F3ICREDNUM = 0): both credential block #1 and credential block #2 will be empty 
(i.e., legitimate skips). 

• The respondent has earned a single credential from the associated postsecondary 
institution (F3ICREDNUM = 1): credential block #1 will store information about the 
highest/only credential the respondent earned from that postsecondary institution, and 
credential block #2 will be empty (i.e., legitimate skips). 

• The respondent has earned more than one credential from the associated postsecondary 
institution (F3ICREDNUM = 2), and the first credential earned from that institution is 
not the same as the highest credential earned from that institution: credential block #1 
will store information about the highest credential the respondent has earned from that 
institution, and credential block #2 will store information about the first credential earned 
from that institution. 

• The respondent has earned more than one credential from the associated postsecondary 
institution (F3ICREDNUM = 2), and the first credential earned from that institution is the 
highest credential earned from that institution: credential block #1 will store information 
about the highest/first credential the respondent has earned from that institution, and 
credential block #2 will store infonnation about an additional credential earned from that 
institution (this data pattern is less common than that described in #3 above). 

It should be noted that the third follow-up student-institution file only includes data 
associated with third follow-up respondents. Postsecondary institutions reported to have been 
attended during the second follow-up interview were referenced in the third follow-up interview, 
and the associated data were carried forward to the third follow-up student-institution file. 
Records associated with second follow-up reported attendance for third follow-up 
nonrespondents were excluded from the third follow-up student-institution file. Such records are 
still available on the second follow-up postsecondary institution file. 

7.1.6 Extant Data Source Files: Ancillary Data Links in the ELS:2002 Base-year to 
Third Follow-up Electronic Codebook 

Rather than merge data from extant data sources on the student file, separate files were 
constructed that can be linked to the student file using Stu ID as the key. Sample members will 
have one record on the Central Processing System (CPS), Scholastic Aptitude Test (SAT), ACT, 
and General Educational Development (GED) data source files, and one or more records on the 
National Student Loan Data System (NSLDS) data source files when data are available. If 
infonnation is not available for that data source, then the student record will be excluded from 
that data source file. The following data source files were used: 
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• the Central Processing System 41 ; 

• the National Student Loan Data System 42 ; 

• the Scholastic Aptitude Test; 

• the ACT; and 

• the General Educational Development. 

Variables representing the extant sources data imported into the second and third follow- 
up rounds are described in appendix K and listed in appendix L of this document. Some 
composite variables have been constructed and included on the student megafile to facilitate use 
of data associated with each extant data source. 

It should be noted that the NSLDS loan and Pell grant data files are cumulative as of the 
third follow-up and may differ slightly in records and content from the versions delivered as part 
of the ELS:2002 second follow-up release. Furthermore, users should note that student-level 
composite variables specific to the second follow-up release (F2PELLCUM for example) were 
constructed using records available as of the second follow-up, whereas third-follow-up-specific 
composites (F3PELLCUM for example) were constructed using the cumulative set of records 
available as of the third follow-up. 

7.2 Data Access and Analysis Tools 

ELS:2002 data have been made available in public-use form via the web-based ED AT 
and PowerStats applications and (for licensed users) in restricted-use form 43 via ECB format. The 
restricted-use ECB is installed from a DVD and is designed to be run in a Windows 
environment. ECB software packages are available at no cost from the National Center for 
Education Statistics (NCES). A summary of all data access and analysis tools is provided in 
table 43 below. 


The CPS contains Free Application for Federal Student Aid data. 

4 ~ The NSLDS database contains records of all federal loans, and Pell grant information, for anyone who has such a loan or grant. 
The NSLDS loan and Pell grant data source files are cumulative as of the third follow-up. 

43 A license is required to access the restricted-use ECB ( http://nces.ed.gov/statprog/confid.asp ). 
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Table 43. Summary of ELS:2002 data access and analysis tools 


ELS:2002 Base-year to Third Follow-Up 


Detail 

Public use 

Restricted use 

Longitudinal 

composition 

BY, FI, F2, and F3 

BY, FI, F2, F3, high school transcripts 

Data file and 

documentation 

publication 

number 

NCES 2014-362 

NCES 2014-365 

Access point 

EDAT, PowerStats 

Request DVD under license 

Data file format 

EDAT: One of six programming languages (SAS, 
Stata, SPSS, Sudaan, R, S-Plus) (or ASCII or 
CSV) 

PowerStats: Does not allow extraction of raw data 

One of three programming languages 
(SAS, SPSS, and Stata) 

Data file variable 
selection 

EDAT: Tag Files 

PowerStats: Does not provide access to micro 
data 

Electronic codebook 

Analysis capability 

EDAT: Does not support interactive analysis; 

provides access to micro data 
PowerStats: General statistics (percentages, 
means, etc.), data tables, table regressions 

Does not support interactive analysis; 
provides access to micro data 


ELS:2002 Base-year to Third Follow-up Data Availability 



Base 

year 

First 

follow-up 

High school 
transcripts 

Second 

follow-up 

Third follow- 
up 

Postsecondary 

transcripts 

Year conducted 

2002 

2004 

2005 

2006 

2012 

2013-2014 

Date available 

Now 

Now 

Now 

Now 

January 

2014 

Spring 2015 

Restricted-use 

(DVD) 

NCES 2014- 
362 

✓ 

✓ 

✓ 

✓ 

✓ 

Released as 
separate file 

Public-use (EDAT) 
NCES 2014- 
365 

✓ 

✓ 


✓ 

✓ 



NOTE: BY = base year. EDAT = Educational Data Analysis Tool. FI = first follow-up. F2 = second follow-up. F3 = third follow-up. 
SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 


7.2.1 Electronic Codebook 

The ECB system serves as an electronic version of a fully documented survey codebook. 
It allows the data user to browse through all ELS:2002 variables contained on the data files, 
search variable and value names for keywords related to particular research questions, review the 
wording of these items along with notes and other pertinent infonnation related to them, examine 
the definitions and programs used to develop composite and classification variables, and export 
SAS, SPSS, or Stata syntax programs for statistical analysis. The ECB also provides an 
electronic display of the distribution of counts and percentages for each variable in the dataset. 
Analysts can use the ECB to select or “tag” variables of interest, export codebooks that display 
the distributions of the tagged variables, and generate SAS, SPSS, and Stata program code 
(including variable and value labels) that can be used with the analyst’s own statistical software. 
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A guide for using the ECB is accessible via the “View Help” option of the “Help” tab in the 
ECB’s main menu. 

7.2.2 Education Data Analysis Tool 

The ED AT web application, available via the NCES website ( http://nces.ed.gov/EDAT ), 
allows users to download public-use data files in fonnats compatible with one of six statistical 
programming languages (SAS, SPSS, S-Plus, Stata, R, and SUDAAN). The contents of 
downloaded files can be customized via the application’s “tag” functionality by which users can 
select variables pertaining to their specific research interests. Additional information regarding 
the ED AT application is available via the NCES website. 

7.2.3 PowerStats Data Analysis Tool 

The PowerStats web application, available via the NCES website, allows users to 
generate simple statistics such as data tables and regressions via an interactive interface. The 
application also allows users to store work associated with their analyses or import (via external 
files) previously generated analyses. Additional information regarding the PowerStats 
application is available via the NCES website. 

7.3 Data File Details 

This section provides an overview of several key data file characteristics. An 
understanding of these characteristics will prove useful with respect to locating variables of 
specific interest and in understanding the nature of the underlying data. 

7.3.1 Reserve Codes 

There are a number of reasons for data to be missing for given variables. We account for 
these situations by filling items with reserve codes. The following reserve code scheme was 
used: 

• -1 “Don’t know.” This reserve code was not used in the third follow-up but is retained for 
prior-round data. 

• -2, “Refused.” This reserve code was not used in the third follow-up but is retained for 
prior-round data. 

• -3 “Item legitimate skip/NA.” Filled for questions that are not answered because prior 
answers route the respondent elsewhere. 

• -4 “Nonrespondent.” Filled for all variables across the entire instrument when a sample 
member did not respond to the instrument. 

• -5 “Suppressed” or “Out of Range.” Used to identify third follow-up restricted-use 
variables appearing on public-use data products as a means of indicating the suppression 
of restricted-use values. (The label for values of -5 in such instances reads “Suppressed.”) 


Education Longitudinal Study of 2002 Base-year to Third Follow-up Data File Documentation 


127 




Chapter 7 . Data File Contents 


This reserve code was also used in the base year and first follow-up (when hardcopy 
surveys were employed) for the purpose of identifying out-of-range responses. 

• -6 “Multiple Response.” This reserve code was not used in the third follow-up and is 
retained for prior-round data. 

• -7 “Not administered, abbreviated interview or breakoff.” Filled for abbreviated interview 
respondents for questions not included in the abbreviated questionnaire or for questions 
that are not answered because the respondent has broken off the interview without 
completing it. 

• -8 “Survey component legitimate skip/NA.” Filled for all variables across the entire 
instrument when a sample member does not apply to a particular instrument or round. It 
is similar to -4 in that it applies to all variables across an entire instrument; however, the 
reason is different in that the sample member never had the chance to respond. 

• -9 “Missing.” Filled for questions that are not answered when the routing suggests that 
respondents should have responded. 

It should be noted that, to better distinguish reserve codes from valid (negative) scale 
values, the reserve codes used for three third follow-up index/scale score variables 
(F3JOBSUPP, F3JOBSATIS, and F3JOBPERSIST) are slightly different than those normally 
used in other ELS:2002 variables. Specifically, -33 is used instead of -3 to denote “item 
legitimate skip”; -44 is used instead of -4 to denote “nonrespondent”; -88 is used instead of -8 to 
denote “survey component legitimate skip/NA”; and -99 is used instead of -9 to denote 
“missing.” 

7.3.2 Variable Naming Conventions 

Data users should find naming conventions for variables, flags, and weights intuitive. 
Most variables begin with an indicator of the round (i.e., base-year variables begin with BY, first 
follow-up with FI, second follow-up with F2, and third follow-up with F3). Weights follow the 
same round-naming convention and also contain the suffix WT (e.g., BYSTUWT is the name for 
the final student weight for base-year questionnaire completion, F3QWT is the equivalent third 
follow-up questionnaire completion weight, F2QWT is the equivalent second follow-up 
questionnaire completion weight, and BYSCHWT is the name for the base-year final school 
weight). 

In the base year and first follow-up (but not the second and third follow-ups), variable 
names also distinguish (in their third character) between components and questionnaire types. 
FIS, for example, indicates a first follow-up student questionnaire variable, whereas FI A stands 
for administrator questionnaire items, and FID refers to the “out of school” (dropout) 
questionnaire. 

Variables that reflect specific items in the questionnaire carry the question number in the 
variable name, immediately after the component indicator. Hence, F1S58 would be item 58 from 
the first follow-up student questionnaire, and FID 19 would be item 19 in the dropout instrument. 
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The round-specific constructed variables are typically not anchored in a single questionnaire item 
and may sometimes reflect nonquestionnaire sources of information, such as the assessments. 
First follow-up test scores carry the prefix F1TX. F1TXMQU, for example, indicates the quartile 
score for the first follow-up mathematics test. Flags are indicated by the suffix FLG or FG. 
Variable names also distinguish between the public (P) and restricted (R) use fonns, where 
variables differ between them (the base-year, first follow-up, and second follow-up public-use 
variables are a subset of the restricted-use superset). It should be noted that variables new to the 
third follow-up do not have independent restricted-use and public-use versions, rather restricted- 
use data are suppressed or recoded as needed for inclusion on public-use data products; these 
third follow-up variables have the same variable names on the restricted-use and public-use data 
files. 

Finally, some slightly different information is included in second and third follow-up 
variable names. In the base year and first follow-up, variable names contain a letter to reference a 
questionnaire (e.g., S = Student) in addition to the round prefix (BY or FI) and frequently 
reference the question number (composite and transcript variables do not link to specific 
questionnaire items so they contain a descriptive reference). The second and third follow-up 
instruments are electronic questionnaires with many pathways; there is no fixed hardcopy 
questionnaire nor question numbers. However, a sequential number within each section or 
thematic area has been assigned to each item from the interview. Whenever possible, second and 
third follow-up variable names were constructed as (using third follow-up as an example) 

F3 {Section Letter} {Sequential Number} {sub-item letter if applicable}. The applicable section 
letters for the second follow-up round are as follows: 

• A — High school section; 

• B — Postsecondary section; 

• C — Employment section; and 

• D — Community section. 

The applicable section letters for the third follow-up round are as follows: 

• A — Current Activities and Education section; 

• B — Current Employment section; 

• C — Employment History and Future section; and 

• D — Family and Community section. 

Variables that do not follow the sequential numbering naming convention are: 

• Postsecondary student-institution variables — These variables were obtained at the 
respondent level and contain information associated with each respondent-identified 
institution. The final file contains one record for each of the institutions that the 
respondent identified in the interview. The variables are named with a descriptive 
reference, where the prefix “F2I” is applied to variables belonging to the second follow- 
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up student-institution file and “F3I” is applied to third follow-up student-institution file 
variables. 

• Composites — These variables were given terse, descriptive names (with similar 
descriptive names used across rounds as applicable), prefixed with the round indicator 
(i.e., BY, FI, F2, F3). 

• Extant data file variables: 

- CPS data, academic years 2004-05 to 2006-07 — For the CPS files that were 
introduced in the base-year to second follow-up data files (academic years 2004- 
05, 2005-06, and 2006-07), variables were assigned a prefix beginning with “C” 
(taken from the acronym “CPS”) followed by 2 digits consistent with the latter 
year of the applicable academic year span. For example, the name of a variable 
belonging to the CPS 2004-05 file would begin with the prefix “C05.” The 
remainder of the variable name comprises a 3 -digit number that is consistent with 
the file position of that particular field on the raw source file it was imported 
from. For example, if the 21st variable on the raw CPS 2004-05 source file was 
selected for inclusion in the data file, the resulting variable name would be 
“C05021” (C {2-digit academic year} {3-digit file position}). 

- CPS data, academic years 2009-10 to 2012-13 — The naming convention for 
variables belonging to CPS files introduced in the third follow-up (academic years 
2009-10 through 2012-13) have a slightly different naming convention than the 
one applied to CPS variables imported as of the second follow-up. The variable 
prefix is consistent with previously imported variables, where the prefix 
comprises a leading “C” (taken from the acronym “CPS”) followed by 2 digits 
consistent with the latter year of the applicable academic year span. The 
remaining portion of the variable name is descriptive in nature. For example, the 
field associated with the student’s state of legal residence, imported from the raw 
CPS 2010-1 1 source file, is named Cl 1STATE (C {2-digit academic 

year} {descriptive string}). 

- NSLDS Pell data — The variables belonging to the NSLDS Pell file do not abide 
by a formal naming convention. The names are purely descriptive in nature. For 
example, the variable pertaining to “Highest cost of attendance” is named 
HICOA. 

- NSLDS Loan data — The variables belonging to the NSLDS Loan file do not 
abide by a formal naming convention. The names are purely descriptive in nature. 
For example, the variable pertaining to “Current loan status date” is named 
LNSTDATE. 

- Blended Test scores/ACT-SAT Concordance — The naming convention for the 
test score variables comprises two portions, the first of which is a variable prefix. 
The prefix begins with “TX” (representing “test”) followed by a string that is 
descriptive in nature. For example, the variable associated with the “Most recent 
ACT composite score” is named TXACTC (TX {descriptive string}). 
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- GED data — The naming convention for variables imported from GED comprise 
two portions, the first of which is a variable prefix. The prefix begins with 
“GED,” consistent with the source acronym, followed by a string that is 
descriptive in nature. For example, the field associated with the “Date passed 
GED test” is named GEDPASDT (GED {descriptive string}). 

7.3.3 Composite Variables 

Composite variables — also called constructed, derived, or created variables — are usually 
generated using responses from two or more questionnaire items or from recoding of a variable 
(typically for disclosure avoidance reasons). Some are copied from another source (e.g., a 
variable supplied in sampling or imported from an external database). Examples of composite 
variables include school variables (school sector, urbanicity, region of the country), math 
assessment scores (achievement quintile in math), and demographic variables (sex, race, 
Hispanic ethnicity, and month and year of birth). 

The base-year to third follow-up data file includes many composite variables for the 
convenience of data users. Composite variables combine or reorganize data, whereas survey 
variables (that is, using the third follow-up survey as an example, variables named with an 
“F3A,” “F3B,” “F3C,” or “F3D” prefix) represent the data as they were collected in the 
interview. Most of the composite variables can be used as classification variables or independent 
variables in data analysis. For better estimation in analysis, many of the composites have 
undergone imputation procedures for missing data (all imputed versions of variables have been 
flagged and are available in composite variables that are named with an “IM” suffix). 

7.4 Analyzing ELS:2002 Data 

ELS:2002 is designed to produce accurate inference for three populations; lOth-grade 
students as of Spring 2002, 12th-grade students as of Spring 2004, and Spring 2002 lOth-grade 
schools. 

To analyze these populations, one must select one of the populations to analyze, identify 
the time periods over which to analyze that population, and identify an appropriate analysis 
weight. 

7.4.1 Analyzing Spring 2002 lOth-Grade Schools 

There is only one analysis weight appropriate for analyzing the Spring 2002 lOth-grade 
school population, the school weight (BYSCHWT) constructed in the base year. Longitudinal 
analyses (the base-year school 2 years later) of this school population may be carried out using 
this base-year weight, and associated base-year school-level data, combined with administrator 
responses and school-level transcript infonnation collected in the first follow-up. As noted in 
chapter 6, analysis of base-year schools that incorporates data from the first, second, or third 
follow-up will support inference to a subpopulation of the base-year population of schools in 
2002 containing a sophomore grade. In particular, the ELS:2002 sample of schools is not 
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representative of schools as of 2004 or any other year aside from 2002. Furthermore, the base- 
year school weight is not adjusted for school nonresponse in the first, second, or third follow-up 
nor is the weight adjusted for school closings. 

7.4.2 Analyzing Spring 2002 lOth-grade Students 

Because the third follow-up student-level file contains records for individuals who were 
not in the 10th grade in Spring 2002 but were in the 12th grade in Spring 2004, analysts should 
first use the variable G10COHRT to restrict analysis to individuals who are in the population of 
Spring 2002 lOth-grade students. The variable G10COHRT equals 1 if a student was in the 10th 
grade in Spring 2002 and equals 0 if a student was not in the 10th grade in Spring 2002. 

In addition to using G10COHRT = 1 to restrict analysis to appropriate individuals, 
analysts should identify the time period(s) for which analysis will be conducted. A variety of 
analysis weights have been constructed that support analyzing the lOth-grade cohort over various 
time periods. The possible time periods and the appropriate analysis weights are shown in table 
44. Note that no specific analysis weight was constructed to analyze the Spring 2002 lOth-grade 
population for some particular combinations of response across the rounds of ELS:2002. 
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Table 44. 

Selecting weights to analyze Spring 2002 lOth-grade population 

BY 

FI 

F2 

F3 

TR 

Available analysis weight(s) 

Optional weights* 

No response data used, only whole population composites 

0 

0 

0 

0 

1 

F1TRSCWT(#) 


Response data from single wave 

1 

0 

0 

0 

0 

BYSTUWT, BYEXPWT 


0 

1 

0 

0 

0 

F1QWT, F1EXPWT 


0 

0 

1 

0 

0 

F2QWT 


0 

0 

1 

0 

1 

F2QTSCWT 


0 

0 

0 

1 

0 

F3QWT 


0 

0 

0 

1 

1 

F3QTSCWT 


Response data from multiple waves 

1 

1 

0 

0 

0 

F1PNLWT, F1XPNLWT 


1 

1 

0 

0 

1 

No specific analysis weight 

F1TRSCWTO, F1PNLWT, 
F1XPNLWT 

1 

0 

1 

0 

0 

F2BYWT 


1 

0 

1 

0 

1 

No specific analysis weight 

F2QTRSCWT(*), F2BYWT 

1 

0 

0 

1 

0 

F3BYPNLWT 


1 

0 

0 

1 

1 

F3BYTSCWT 


1 

1 

1 

0 

0 

No specific analysis weight 

F2F1WT(*), F2BYWT 

1 

1 

1 

0 

1 

No specific analysis weight 

F2QTRSCWT(*), F2F1WT, 
F2BYWT 

1 

1 

0 

1 

0 

No specific analysis weight 

F3BYPNLWT(*), F3F1PNLWT 

1 

1 

0 

1 

1 

No specific analysis weight 

F3QTRSCWT(*), 

F3BYPNLWT(‘), 

F3F1PNLWT 

1 

0 

1 

1 

0 

No specific analysis weight 

F3BYPNLWT(*) 

1 

0 

1 

1 

1 

No specific analysis weight 

F3BYTSCWT(‘), 

F3BYPNLWT 

1 

1 

1 

1 

0 

No specific analysis weight 

F3BYPNLWT(*), F3F1PNLWT 

1 

1 

1 

1 

1 

No specific analysis weight 

F3BYTSCWT(‘), 

F3BYPNLWT, 







F3F1PNLWT 

0 

1 

1 

0 

0 

F2F1WT 


0 

1 

1 

0 

1 

No specific analysis weight 

F3QTSCWT(*), F2F1WT 

0 

1 

0 

1 

0 

F3F1PNLWT 


0 

1 

0 

1 

1 

F3F1TSCWT 


0 

0 

1 

1 

0 

No specific analysis weight 

F3QWT(‘) 

0 

0 

1 

1 

1 

No specific analysis weight 

F3QTSCWT(*), F3QWT 

0 

1 

1 

1 

0 

No specific analysis weight 

F3F1PNLWT(*) 

0 

1 

1 

1 

1 

No specific analysis weight 

F3F1TSCWT(*), 
F3F1 PNLWT(*) 


* Denotes weight that maximizes number of records. 

# F1TRSCWT is defined for base-year or first follow-up respondents as well as first follow-up questionnaire-incapable sample 
members who have high school transcript data collected during the first follow-up so some sample members who have a 
nonmissing value for this weight would not be included in analyses that incorporate first follow-up interview data. 

NOTE: BY = base year. FI = first follow-up. F2 = second follow-up. F3 = third follow-up. TR = transcript. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 
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For any given response pattern across the four ELS:2002 study rounds, the analysis 
weights that were specifically constructed to be representative of the Spring 2002 lOth-grade 
cohort using individuals who exhibited that particular response are listed in the column labeled 
“Available analysis weights.” There are two reasons multiple weights may be available. One 
weight was constructed to account for student nonresponse at various rounds and another weight 
was constructed to account for schools that did not participate in providing student transcript 
information. Additional weights were also constructed to allow analysts to include or exclude 
questionnaire-incapable students. Where no specific analysis weights were constructed, such as 
for individuals who responded in all four rounds, the set of potential weights that could be used 
with these individuals is given in the column labeled “Optional weights.” The weight marked 
with an asterisk, in any given row, denotes the weight that provides the most records to use in an 
analysis that incorporates data from the rounds identified in the first five columns of that row. 

7.4.3 Analyzing Spring 2004 12th-grade Students 

Because the third follow-up student-level file contains records for individuals who were 
not in the 12th grade in Spring 2004 but were in the 10th grade in Spring 2002, analysts should 
first use the variable G12COHRT to restrict analysis to individuals who are in the population of 
Spring 2004 12th-grade students. Note that some individuals were not classified as members of 
the 12th grade in Spring 2004 as of the first follow-up but were detennined to be members of 
that population during the second follow-up. Because analysis of the 12th-grade population using 
first follow-up weights is intrinsically tied to the identification of members of the 12th-grade 
population as of the first follow-up, the value of G12COHRT was changed from 0 to 2 for 
individuals not classified as part of the 12th-grade population during the first follow-up but who 
were subsequently classified as part of the 12th-grade population for the second follow-up. When 
using first follow-up weights to analyze the 12th-grade population, records should be restricted 
to those where G12COHRT = 1 but, when using second or third follow-up weights, records 
should also include those records where G12COHRT = 2. 

In addition to using G12COHRT to restrict analysis to appropriate individuals, analysts 
should identify the time period(s) for which analysis will be conducted. A variety of analysis 
weights have been constructed that support analyzing the 12th- grade cohort over various time 
periods. The possible time periods and the appropriate analysis weight are shown in table 45. 
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Table 45. 

Selecting weights to analyze Spring 2004 12th-grade population 

BY 

FI 

F2 

F3 

TR 

Available analysis weight(s) 

Optional weights* 

No response data used, only whole population composites 

0 

0 

0 

0 

1 

F1TRSCWT(#) 


Response data from single wave 

1 

0 

0 

0 

0 

Not applicable 


0 

1 

0 

0 

0 

F1QWT, F1EXPWT 


0 

0 

1 

0 

0 

F2QWT 


0 

0 

1 

0 

1 

F2QTSCWT 


0 

0 

0 

1 

0 

F3QWT 


0 

0 

0 

1 

1 

F3QTSCWT 


Response data from multiple waves 

1 

1 

0 

0 

0 

Not applicable 


1 

1 

0 

0 

1 

Not applicable 


1 

0 

1 

0 

0 

Not applicable 


1 

0 

1 

0 

1 

Not applicable 


1 

0 

0 

1 

0 

Not applicable 


1 

0 

0 

1 

1 

Not applicable 


1 

1 

1 

0 

0 

Not applicable 


1 

1 

1 

0 

1 

Not applicable 


1 

1 

0 

1 

0 

Not applicable 


1 

1 

0 

1 

1 

Not applicable 


1 

0 

1 

1 

0 

Not applicable 


1 

0 

1 

1 

1 

Not applicable 


1 

1 

1 

1 

0 

Not applicable 


1 

1 

1 

1 

1 

Not applicable 


0 

1 

1 

0 

0 

F2F1WT 


0 

1 

1 

0 

1 

No specific analysis weight 

F3QTSCWT(*), F2F1WT 

0 

1 

0 

1 

0 

F3F1PNLWT 


0 

1 

0 

1 

1 

F3F1TSCWT 


0 

0 

1 

1 

0 

No specific analysis weight 

F3QWT(*) 

0 

0 

1 

1 

1 

No specific analysis weight 

F3QTSCWT(*), F3QWT 

0 

1 

1 

1 

0 

No specific analysis weight 

F3F1 PNLWT(*) 

0 

1 

1 

1 

1 

No specific analysis weight 

F3F1TSCWT(*), 







F3F1 PNLWT(*) 


* Denotes weight that maximizes number of records. 

# F1TRSCWT is defined for base-year or first follow-up respondents as well as first follow-up questionnaire-incapable sample 
members who have transcript data collected during the first foilow-up so some sample members who have a nonmissing value for 
this weight would not be included in analyses that incorporate first follow-up interview data. 

NOTE: BY = base year. FI = first follow-up. F2 = second follow-up. F3 = third follow-up. TR = transcript. 

SOURCE: U.S. Department of Education, National Center for Education Statistics, Education Longitudinal Study of 2002 
(ELS:2002), Third Follow-up, 2012. 

For any given response pattern across the four ELS:2002 study rounds, the analysis 
weights that were specifically constructed to be representative of the Spring 2004 12th-grade 
cohort using individuals who exhibited that particular response are given in the column labeled 
“Available analysis weights.” There are two reasons multiple weights may be available; some 
weights were constructed that excluded questionnaire-incapable students and some weights were 
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constructed to generalize to the 12th- grade cohort using only individuals with high school 
transcript data. 

Where no specific analysis weights were constructed, such as for individuals who 
responded in all four rounds of ELS:2002, the set of potential weights that could be used with 
these individuals are given in the column labeled “Optional weights.” The weight marked with 
an asterisk, in any given row, denotes the weight that provides the most records to use in an 
analysis that incorporates data from the rounds identified in the first five columns of that row. 

7.4.4 Notes on Selecting Weights for Analysis 

There are four general steps required to identify an appropriate weight for a particular 
analysis. These four steps are listed below followed by some examples of additional 
considerations that may be required, especially when trying to identify an appropriate weight 
when no specific analysis weight was constructed for a particular analysis. 

• Identify the cohort of interest and refer to the appropriate tables in sections 7.4.2 and 
7.4.3 to identify the subset of all possible weights that may be used to make inference 
about that cohort. Also ensure that the appropriate cohort flag variable is used to restrict 
analysis to the individuals in the cohort of interest. 

• Identify the time periods that contain data that are to be used in analysis and refer to the 
appropriate table in sections 7.4.2 and 7.4.3 to identify the available appropriate 
weights. 44 

• If multiple weights are available for the given cohort and time period, select one weight 
for analysis by, for example, reviewing table 21 in section 6. 2.2. 5, which lists all weights 
and identifies the sample members with a non-zero value for each weight. 

• Examine the potential weights and other data to be used in analysis to identify missing 
data patterns. Missing data patterns may lead to the rejection of some weights from 
consideration. 

There are three topics associated with the ELS:2002 sample that warrant specific 
discussion because of their impact on weight construction and, therefore, on the appropriateness 
of weights for particular analyses. These three topics are sample freshening, expanded sample 
members, and high school transcript data collection. 

7.4.4.1 Sample Freshening and First Follow-up New Participant Supplement 

Sample freshening occurred in the ELS:2002 first follow-up supplement the existing 
ELS:2002 base-year sample so that the updated sample would be representative of the Spring 
2004 12th-grade population as well as the original Spring 2002 lOth-grade population. This 
freshening of the ELS:2002 base-year sample introduced some nuances that bear additional 


44 If analysis only uses data from a single time point, then a weight is available to support that analysis. As a general rule, if 
analyzing data from multiple time points, consider using weights constructed for the latest time point. 
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discussion. The individuals added to the ELS:2002 sample in the first follow-up represent that 
portion of the 2004 senior population who were not in the 10th grade in 2002. The freshened 
sample members are not part of the lOth-grade cohort but are part of the 12th-grade cohort. 
Because the freshened sample members did not participate in the base year, a supplemental 
questionnaire was used in the first follow-up to try and collect a minimum amount of information 
from them. 

The new participant supplement was also used to collect information in the first follow- 
up from base-year nonrespondents. An example of the impact of the new participant supplement 
on weight construction arises when considering the two weights F1QWT and F1PNLWT for use 
in analysis of the 10th- grade cohort. The weight, F1PNLWT, will exclude first follow-up 
respondents who did not respond in the base year. The single round weight, F1QWT, will, 
however, include base-year nonrespondents who responded in the first follow-up. Both weights 
may be used to analyze the lOth-grade cohort but may produce different estimates for specific 
analyses not just because of the difference between a longitudinal and a single round weight but 
also because of inclusion of base-year nonrespondents in the first follow-up. 

7.4.4.2 Expanded Sample Members 

In each round of ELS:2002, some sample members were questionnaire-incapable; 
however, additional information was often collected from other sources for those sample 
members. To support analyses of the data that were collected on questionnaire-incapable sample 
members, expanded sample weights were constructed in some rounds. These expanded sample 
weights are non-zero for respondents as well as questionnaire-incapable sample members. 

As illustration, consider two weights constructed in the first follow-up to permit analysis 
of the 10th- and 12th-grade cohorts as of the first follow-up: F1QWT and FI EXP WT. The two 
weights have non-zero values for slightly different sets of first follow-up respondents. Students 
who responded in the first follow-up or were questionnaire-incapable for the first follow-up have 
a non-zero value for FI EXP WT. Students who responded in the first follow-up have a non-zero 
value for F1QWT but students who were questionnaire-incapable for the first follow-up have a 
zero value for F1QWT. 

The second and third follow-up weights are set to zero for students who were 
questionnaire-incapable as of the corresponding study round. For example, F2QWT is set to 0 for 
sample members who were questionnaire-incapable as of the second follow-up and F2QWT is 
designed to represent that population of capable students as of the second follow-up. Note that 
the questionnaire capability status of students may change over time so students may be 
represented in some rounds and not other rounds. 

7.4.4.3 High School Transcripts 

High school transcripts were collected after the first follow-up for base-year respondents 
as well as first follow-up respondents. A series of high school transcript weights were 
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constructed in the first, second, and third follow-up to support the analysis of transcript data 
along with interview data. 

F1TRSCWT is unique among the set of high school transcript weights in that it is non- 
zero for first follow-up respondents with transcript data and non-zero for first follow-up 
nonrespondents who responded in the base year but who also have transcript data. Because of its 
construction, some sample members with a non-zero value of this weight will not have first 
follow-up interview data so these sample members would drop out of an analysis that seeks to 
incorporate first follow-up interview data with transcript data. Care should be taken to consider 
the set of data to be used with the analysis weight F1TRSCWT to ensure that not too many 
records with a nonmissing weight value are excluded. As a general rule of thumb, F1TRSCWT 
will most likely be the best available weight to use when incorporating high school transcript 
data in analyses. Because F1TRSCWT is not adjusted to account for incomplete transcripts, only 
whether a transcript was provided, some additional analysis may be necessary to account for 
missing data in any given analysis. 

7 AAA Missing Data 

The three topics discussed in brief above — sample freshening, expanded sample 
members, and high school transcripts — are designed to illustrate three specific considerations 
that deal with identification of missing data patterns. The choice of appropriate analysis weight 
must take into account not just the available weights but the set of data to be used in a given 
analysis. The relationships between response status, weights, and data should be reviewed to 
ensure that the selected weight is appropriate for the analysis at hand. The review of missing data 
patterns is most important when considering analyses that incorporate data collected across 
multiple ELS:2002 rounds. 
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